Stefan Probst wrote:
> (1) Sorting
> It is said, that in sorting, all combining marks should be disregarded.
Discarded *as primary differences*. In other words, all a's come before
all b's (not just initially, but at every point in the string), and the
distinction between various kinds of a's with marks
is considered only if all the other letters are the same.
But sorting, unlike decomposition, positively *requires* per-language
tailoring, and proper i18n sort routines always support it.
Again the Scandinavian example is relevant: the "accented letters"
are not only really primary letters (and so get a primary difference
from their non-accented counterparts), but also sort at the end
of the alphabet.
> While in Vietnamese this is OK for the (combining) tone marks, it is
> absolutely not OK for the (combining) modifiers. In Vietnamese, e.g. an
> "a" with "circumflex" is a completely different character than an "a"
> alone.
Right enough. So a-circ has a primary difference from a in
VN tailoring.
> This is, why some circles in Vietnam prefer what I call "VN-combined":
That is based on the naive notion, which does not even work for English,
that binary sorting can ever be culturally correct sorting. It can't.
> And since we are already in Vietnamese.... (to round the things up):
> I am not sure, how e.g. in the introduction to dictionaries or
> Vietnamese language books, the tonal mark can be printed "alone". One
> solution might be to combine them with a "space", but at present, this
> does not work always.
When does it not? It is the standard Unicode thing to do.
-- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
This archive was generated by hypermail 2.1.2 : Wed Jan 30 2002 - 11:22:17 EST