>And if you work with linguistics, an ä cannot be decomposed
when you work with Swedish, as it is a single letter. The dots
above are not an
accent or diacritic mark. So here is a case where you need to
be able to represent what looks like the same glyph "an a with
two dots above", both as one character and as an a with
combining dots.
When you are doing linguistic work, there are inevitably times
when you need to treat sequences as a unit (e.g. ll or ch for
Spanish); you may even need to treat discontiguous sequences as
a unit (e.g. Thai sara ia). So even if Swedish a-umlaut
(Iguessing that's what you wrote - my mail reader is showing me
o-tilde) must be treated as a unit for analysis purposes, it
doesn't matter whether it is encoded as a unit or a sequence.
You've got to be able to handle sequences in this manner
anyway. This argument, therefore, does not follow through.
More generally, our software systems must, for various
purposes, have the ability to treat n characters as a sequence
of m units (consider Scottish name sorting which equates Mc and
Mac). If they don't do this, then they are to that extent
lacking in their level of internationalisation.
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT