Ken:
>The ones everyone knows about are Vietnamese, Greek,
>and Hebrew.
Not necessarily everyone knows... ;-)
>I would expect that any language that made extensive
>use of more than a single accent on top of a letter
>might have some history of horizontal accomodations
>for the accents in its typography, however. It is
>just the natural thing for typographers to do when
>trying to create typefaces that work while having to
>deal with multiple accents.
These aren't the only cases to consider. I'm thinking of cases
of new orthographies for previously unwritten languages. Such
languages obviously have no such traditions, yet it's possible
that they may horizontally position diacritics.
>You really only would need to start language tagging
>if you are faced with having to deal with aggressively
>multilingual text, for which mixed conventions
>regarding accent stacking were significant and
>required to be rendered correctly. Frankly I think that
>is a small percentage case inside a small percentage
>case.
I don't know for sure what "agressively multilingual" means,
but it only takes two languages for such problems to arise. And
it is not necessarily the case that this is unlikely to occur.
>Unicode is not intended as a generic text layout macro
>language...
You know I know that.
>*Some* aspects of text layout need to be left to text
>markup and text description languages. :-) And it isn't
>clear that trying to include a plain text character
>mechanism for describing exactly how accents are placed
>over a letter makes sense to include in the character
>encoding per se.
But Unicode does provide some level of support for this where
it pertains to the meaning of text. That's why we have
canonical ordering classes. So I'm just trying to determine how
far we go and what can be done in situations that involve novel
use of existing scripts.
Let me give an example case (which I think is real):
Thai diacritics when used for Standard Thai have strict
co-ocurrence restrictions, and of those that can co-occur above
a base character - vowel + tone or vowe + thanthakhat - it is
always the case that the tone or thanthakhat stacks above the
vowel. One particular co-occurence restriction is that mai tai
khu never co-occurs with any other diacritic.
There are a number of Mon-Khmer minority languages spoken in
Thailand. Typically, these languages have a number of
phonological distinctions that are not found in Thai,
particularly related to vowel articulation. As a result, when
writing one of these languages using Thai script, a
significantly larger number of spellings for vowels are needed.
For most such languages, it is likely that orthographic
innovations will include (but not necessarily be limited to)
the use of combinations of superior diacritics that do not
co-occur when writing Standard Thai. Such combinations could
include (for example) mai eek and mai thoo or mai eek and mai
tai khuu positioned side-by-side above the base character. (I
don't recall right now exactly what combinations I've been told
about, but I'm pretty sure there were some that involved
side-by-side positioning.)
The likelihood of documents containing text from such a
language as well as text in Thai is high.
So, in a case like this, is language-specific rendering (and
language tagging as needed - which would be always for data
that will be exchanged) deemed to be the appropriate solution,
or might we want to consider some mechanism in Unicode (e.g.
base + diacr + GJ/ZWJ + diacr)?
Note: the only cases I currently know of where this is
potentially an issue involve Thai script.
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT