Thomas Chan wrote:
> > Does it exist at least one character > U+FFFF that is
> > commonly used in at least one modern language?
>
> How about music and math notation?
About the music symbols in Unicode 3.1, they are just the basic building
blocks for it. So I assume that handling surrogates (or UTF-32) would be the
minimum requirement for applications supporting the special complex
formatting capabilities of music.
About the math symbols in Unicode 3.1, why should I be the one who breaks
the silence? :-)
> But, yes. U+21075,[1] gan, is an aspect marker in Cantonese,
> that when
> placed after a verb, denotes continuing action (roughly equivalent to
> <-ing> in English). I don't think anyone would dispute the
> indispensability or high frequency of this character.
This is exactly the kind of info that I was seeking, thanks.
It is not very clear to me what is included in Extension B: how is it
possible to know something more about it?
> I probably wouldn't use "idiosyncratic" as an adjective to
> describe the *majority* of them, but "rare" and "ancient"
> (perhaps "historical"[2] would be a better word choice?)
> are correct.
Sorry, I probably misused the term. And I was assuming that KHSCS had been
unified with Extension B.
> [2] e.g., the "recently deceased", such as Vietnamese chu+~ no^m
> characters in Plane 2, or even Deseret in Plane 1.
Well, I guess that Chu-Nôm and Deseret are hardly known out of this mailing
list.
Clearly, it is worth to implement specialized notations or historical
scripts in widely used software such as Internet browsers, e-mail clients,
word processors, etc.
But the discussion was about porting existing applications to Unicode for
the purpose of being able to localize/use them in new markets.
Imagine concrete cases. E.g., I do software for the retail industry.
My managers could come and ask me to localize our solution for a retailer,
based in South China, who want their receipts and GUI messages to be in
Cantonese.
In *this* case I can push Unicode and fully justify the burden of UTF-16
support and, especially, the burden of checking that all programmers in the
team behave themselves with strings (e.g., they won't trim strings blindly,
leaving a lonely high surrogate at the end of it).
But you can imagine how winning would be the argument of UTF-16 for printing
pentagrams or on receipts (or algebraic formulae, or an aborted orthographic
for English, or the script used in Viet-Nam centuries ago)...
_ Marco
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT