From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed Jun 28 2006 - 16:37:50 CDT
Does anyone know of invisible encoding distinctions (*not* canonically
equivalent) actually being deliberately used by significant groups of users?
I can think of a few possibilities:
(1) <U+17D2 KHMER SIGN COENG, U+178A KHMER LETTER DA> v. <U+17D2, U+178F
KHMER LETTER TA>. It is recommended that the choice be made according to
pronunciation, but this would be unetymological in a few words.
(2) Use of <U+034F COMBINING GRAPHEME JOINER> (CGJ) to distinguish digraphs
from accidental sequences in sorting. The usual example given is Slovak
'ch'; Welsh 'ng' could also become a significant possibility. The two cases
(Hebrew and German) where it is intended to affect the rendering are not
relevant to my question.
However, I have no evidence of whether these distinctions are actually being
made by significant number of users. I can well imagine CGJ only being used
on keywords, and then only when the sorting process otherwise yields the
wrong order for the data set in actual use.
Richard.
This archive was generated by hypermail 2.1.5 : Wed Jun 28 2006 - 17:10:11 CDT