|
Forum rules
Use this forum for technical discussion of UAXes 11, 14, 15, 24, 29, 31, 34, 42, and 44. Technical discussion of UTSes 6, 10, 18, 22, 39, and 46. Technical discussion of UTRs 16, 17, 20, 23, 25, 26, 33, and 36, as well as the related properties and files in the Unicode Character Database.
|
Page 1 of 1
|
[ 4 posts ] |
|
| Author |
Message |
|
mfilmore
|
Post subject: General Properties of Formatting and Combining Marks Posted: Mon Apr 22, 2013 8:43 pm |
|
Joined: Sun Apr 21, 2013 11:04 pm Posts: 2
|
|
Greetings,
I did a simple scanner which returns the location of garpheme sequences, but was surprised by an exclusion.
The IDEOGRAPHIC DESCRIPTION CHARACTERs I expected to have a General Property of Cf. [U+2FF0..U+2FFA]
The CJK STROKE code points I expected to have a General Property of Mc. [U+31C0..U+31E3]
These are both tagged as So, which leaves me confused.
I can obviously retag all of what I view as formatting marks as Cf and combining marks as Mc in my lookup tables, but I am wondering how this has been rationalized within your discussions.
Perhaps you can help my education in this area?
Thanks much, -Mike Filmore
|
|
| Top |
|
 |
|
vanisaac
|
Post subject: Re: General Properties of Formatting and Combining Marks Posted: Mon Apr 22, 2013 10:52 pm |
|
Joined: Mon Feb 01, 2010 6:18 pm Posts: 76
|
|
None of those characters are formatting or combining characters. They are simple, spacing, graphic characters that do not combine with other characters in any way, and do not modify the appearance or canonical interpretation of any adjacent characters.
|
|
| Top |
|
 |
|
mfilmore
|
Post subject: Re: General Properties of Formatting and Combining Marks Posted: Tue Apr 23, 2013 11:31 am |
|
Joined: Sun Apr 21, 2013 11:04 pm Posts: 2
|
vanisaac wrote: None of those characters are formatting or combining characters. They are simple, spacing, graphic characters that do not combine with other characters in any way, and do not modify the appearance or canonical interpretation of any adjacent characters. Ah. So CDL can be _described_ in Unicode but is not part of the Unicode Standard. Given the flexability to denote CJK characters not currently represented (or alternate glyph/ideogram representations), I would suggest incorporating CDL processing onto the standard. Thank you for the clarification! -Mike
|
|
| Top |
|
 |
|
vanisaac
|
Post subject: Re: General Properties of Formatting and Combining Marks Posted: Tue Apr 23, 2013 5:49 pm |
|
Joined: Mon Feb 01, 2010 6:18 pm Posts: 76
|
mfilmore wrote: Ah. So CDL can be _described_ in Unicode but is not part of the Unicode Standard. The Ideographic Description Characters are used as simple graphics for describing how novel characters would look, usually in academic contexts, much like the Manuel de Codage uses punctuation characters to describe the relation of hieroglyph elements to each other. mfilmore wrote: Given the flexability to denote CJK characters not currently represented (or alternate glyph/ideogram representations), I would suggest incorporating CDL processing onto the standard. If you were able to do this, complex CJK characters would have multiple encodings, and doing even the most basic of text analysis - sorting, searching, counting characters, would become incredibly complex and slow, or even impossible, not to mention be insanely insecure in terms of phishing and spoofing. It's just plain not going to happen.
|
|
| Top |
|
 |
|
Page 1 of 1
|
[ 4 posts ] |
|
Who is online |
Users browsing this forum: No registered users and 1 guest |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|
|