Peter_Constable@sil.org scripsit:
> >I think there are other non-spacing characters (diacritics) that have the same
> > Unicode character code value but different meanings in different scripts. And
> > like Mr. Figge I begin to wonder why these two meanings are not treated
> > differently, like Latin A, Greek Alpha and Cyrillic A have different code
> > values. Maybe someone can clarify this.
>
> I believe the main reason that these were kept separate is for round-trip
> convertibility with existing standards.
There's another reason: the search problem. If you search a multilingual
document for "ABC" you do not want Cyrillic A-Ve-Es being found too.
It's quite bad enough that Fullwidth-A-B-C will be missed by a naive
search algorithm, but at least those are compatibility equivalents.
-- John Cowan cowan@ccil.org e'osai ko sarji la lojban.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT