From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Tue Aug 05 2003 - 18:31:03 EDT
On 05/08/2003 15:09, Mark Davis wrote:
>><< Zs, Zl, and Zp are considered format characters, but their
>>membership in the Z (separator) class takes precedence over their
>>membership in the Cf class, because the General Category assigns
>>
>>
>only
>
>
>>a single value to each character. >>
>>
>>
>
>Whenever you have a question about the status of a character, you need
>to look it up in the UCD. You can either do that by going through the
>unicode website, or if you want a more readable interface, use the ICU
>character browser, which formats that data.
>
>Look at space, U+0020.
>
>http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?go=0020&ch.x=4&ch.y=7řO
>
>The general category is Space_Separator, *not* a format character.
>
>Now wording there could definitely be clearer, but the operant phrase
>is:
>
>
>
>>...but their
>>membership in the Z (separator) class *takes precedence* over their
>>membership in the Cf class...
>>
>>
>
>So it would be cleared to say something like:
>
>In many ways the characters, Zs, Zl, and Zp, are similar to format
>characters, but because their general usage is significantly different
>they are broken out into a separate General Category, as Separator
>characters.
>
>Mark
>__________________________________
>http://www.macchiato.com
>► “Eppur si muove” ◄
>
>
>
>
Thank you, Mark. This helps to clarify things, but still doesn't
explicitly answer my question of how to encode "a sentence like "In this
language the diacritic ^ may appear above the letters ...", but instead
of ^ I want to use a combining character" and want to display exactly
one space before the combining character - do I encode two spaces or one?
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Tue Aug 05 2003 - 19:23:24 EDT