RE: Canonical ordering

From: Jonathan Rosenne (
Date: Tue May 02 2000 - 17:28:54 EDT

I would like to take this opportunity to state that the Unicode rendering and placement
rules are not suitable for Hebrew.

In particular:

The order of the diacritical marks should not affect their rendering.

If you don't know how to render them, or it gets too complicated, or you just don't feel
like it, just ignore them. Do not display an unrenderable character glyph.


> -----Original Message-----
> From: Peter Constable []
> Sent: Tuesday, May 02, 2000 6:43 PM
> To: Unicode List
> Cc:
> Subject: Re:Canonical ordering
> Ken:
> >The ones everyone knows about are Vietnamese, Greek,
> >and Hebrew.
> Not necessarily everyone knows... ;-)
> >I would expect that any language that made extensive
> >use of more than a single accent on top of a letter
> >might have some history of horizontal accomodations
> >for the accents in its typography, however. It is
> >just the natural thing for typographers to do when
> >trying to create typefaces that work while having to
> >deal with multiple accents.
> These aren't the only cases to consider. I'm thinking of cases
> of new orthographies for previously unwritten languages. Such
> languages obviously have no such traditions, yet it's possible
> that they may horizontally position diacritics.
> >You really only would need to start language tagging
> >if you are faced with having to deal with aggressively
> >multilingual text, for which mixed conventions
> >regarding accent stacking were significant and
> >required to be rendered correctly. Frankly I think that
> >is a small percentage case inside a small percentage
> >case.
> I don't know for sure what "agressively multilingual" means,
> but it only takes two languages for such problems to arise. And
> it is not necessarily the case that this is unlikely to occur.
> >Unicode is not intended as a generic text layout macro
> >language...
> You know I know that.
> >*Some* aspects of text layout need to be left to text
> >markup and text description languages. :-) And it isn't
> >clear that trying to include a plain text character
> >mechanism for describing exactly how accents are placed
> >over a letter makes sense to include in the character
> >encoding per se.
> But Unicode does provide some level of support for this where
> it pertains to the meaning of text. That's why we have
> canonical ordering classes. So I'm just trying to determine how
> far we go and what can be done in situations that involve novel
> use of existing scripts.
> Let me give an example case (which I think is real):
> Thai diacritics when used for Standard Thai have strict
> co-ocurrence restrictions, and of those that can co-occur above
> a base character - vowel + tone or vowe + thanthakhat - it is
> always the case that the tone or thanthakhat stacks above the
> vowel. One particular co-occurence restriction is that mai tai
> khu never co-occurs with any other diacritic.
> There are a number of Mon-Khmer minority languages spoken in
> Thailand. Typically, these languages have a number of
> phonological distinctions that are not found in Thai,
> particularly related to vowel articulation. As a result, when
> writing one of these languages using Thai script, a
> significantly larger number of spellings for vowels are needed.
> For most such languages, it is likely that orthographic
> innovations will include (but not necessarily be limited to)
> the use of combinations of superior diacritics that do not
> co-occur when writing Standard Thai. Such combinations could
> include (for example) mai eek and mai thoo or mai eek and mai
> tai khuu positioned side-by-side above the base character. (I
> don't recall right now exactly what combinations I've been told
> about, but I'm pretty sure there were some that involved
> side-by-side positioning.)
> The likelihood of documents containing text from such a
> language as well as text in Thai is high.
> So, in a case like this, is language-specific rendering (and
> language tagging as needed - which would be always for data
> that will be exchanged) deemed to be the appropriate solution,
> or might we want to consider some mechanism in Unicode (e.g.
> base + diacr + GJ/ZWJ + diacr)?
> Note: the only cases I currently know of where this is
> potentially an issue involve Thai script.
> Peter

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT