Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 An idea of hobby project for the weekend, Anyone interested?

views
     
MatQuasar
post Oct 11 2023, 11:33 AM

Casual
***
Validating
329 posts

Joined: Jun 2023
I am thinking to create a Simplified <--> Traditional Chinese converter. As known, Chinese characters are 3-byte UTF-8, so I thought if there is a universal formula to convert between Simplified Chinese and Traditional Chinese..... But from my findings, it seems like the there is no such formula, maybe need to keep a long list? Can anyone advise, I want to manipulate UTF-8 myself.
MatQuasar
post Oct 11 2023, 01:23 PM

Casual
***
Validating
329 posts

Joined: Jun 2023
QUOTE(angch @ Oct 11 2023, 12:51 PM)
You'll end up needing custom mappings like this: https://github.com/BYVoid/OpenCC/tree/master/data/dictionary

Even the simpler implementation uses a hard coded dictionary mapping: https://github.com/siongui/gojianfan/blob/master/charsets.go
*
Thanks for the idea! Wah, the charsets.go has two very long lines!
MatQuasar
post Oct 25 2023, 08:16 PM

Casual
***
Validating
329 posts

Joined: Jun 2023
Found this code from China netizen, it supports validation of alphabet, digits and Chinese characters:
CODE
String regex = "^[a-zA-Z0-9\u4E00-\u9FA5]+$";


Looks like the \u4E00 to \u9FA5 must memorize, very useful in the future.

 

Change to:
| Lo-Fi Version
0.0198sec    1.71    7 queries    GZIP Disabled
Time is now: 23rd December 2025 - 10:34 AM