Peter asked:
>
> That reminds me of another issue along these lines: What to do
> for languages like Chinantec and Mixtec that write using Latin
> script and indicate tones using superscript letters? (The tone
> systems of these languages are far more complex than those of
> African languages, so diacritics like acute and grave didn't
> suffice. Yes, it's *far* from great, but apparently it was the
> best option.) Can the superscript 1 - 5 characters
>
> U+00B9
> U+00B2
> U+00B3
> U+2074
> U+2075
>
> work for these?
Yes. If the orthography is using superscript digits for tone marks,
and you want to represent them in plain text (as opposed to styled
text), then the compatibility superscript digits would be appropriate
for this purpose.
> They are distinct from U+0030-0039, but the
> latter are the compatibility decompositions, and the semantics
> aren't ideal (general category: No; bidi category, EN).
> Thoughts?
The compatibility decompositions should not pose a problem -- most
processes are not going to be lumping together characters with
compatibility decompositions, since to do so loses formatting or
other information. The No general category saves you from anything
that is messing with digits per se, and you shouldn't be running into
much bidirectional Mixtec or Chinantec text, I would think. (How many
scholarly papers on Mixtec are written in Arabic?)
Use of the superscript digits might fool generic word selection
operations, and that could be a problem, but likely no more so than
other orthographies that make use of "non-letters" as letters. (And no
worse than many current word selection operations behave when hitting
things like hyphens and apostrophes.)
--Ken
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT