From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Tue Mar 28 2006 - 15:59:33 CST
Antoine Leca wrote:
> > Yes, and they already are. U+0308 COMBINING DIAERESIS vs. U+030B
> > COMBINING DOUBLE ACUTE. There is no "umlaut" character...
>
> I did use Umlaut to clearly (at least I thought) denote the characteristic
> German *feature*, NOT the codepoints.
For typeset modern German text DIEARESIS is consistently used (though
most often via precomposed letters).
> > And m² is not at all the same as m2.
>
> I guess no, although I am not completely sure (particularly
> since I expect
> the second to read "m<SUP>2</SUP>" instead,
No. While that is an good approach in the general case (for arbitrary
power-to *math* expression), I think it is a bad idea for the SI unit powers.
> >> So, if the original encoder does NOT make a distinction in
> >> meaning between the two forms, why would Unicode require
> >> him to encode this difference at codepoint level?
> >
> > How do you know if the "original encoder" makes the difference or
> > not?
>
> Because *I* am the original encoder, in this stanza. :-)
So you only read your own texts. Interesting... ;-)
> Because my feeling (in fact, my interpretation of the Unicode
> and ISCII
> description) is that the Indic codepoints are abstract
> characters, not those
> elements which combine in defined ways to produce some
> glyphic intermediate
> elements, which only remains to be actually drawn by the
> font, as it seems you are thinking.
I do not see why characters in Indic scripts should be more "abstract"
than for other scripts.
> I base that view, first on the fact that the virama concept
> forces a need
> for some abstraction layer (reordering, combination,
> so-called backstore,
> etc.) which is absent even from Thai, and even more from
> Western scripts;
> and secondly because of the underlying nature of the
> Brahmi-derived scripts,
> with the sounds associated, the sandhi phenomena, etc.
The "sounds associated" are completely and totally irrelevant.
Unicode encodes scripts, not sounds.
> when the author
> is supposed to add some precision; this is much like the
> character styles
> used in Western typography (rendered as HTML spanning styles,
> for example).
That does not apply to different spellings. I would not expect any
kind of style span (HTML or otherwise) to say "display 'š' as 'sh'".
Nor do I expect any acceptable font to have an "sh" glyph for "š".
> > I have a really hard time understanding why apparent spell changes
> > should be mediated by fonts changes for Indic scripts. It is not the
> > done that way for any other scripts
>
> Huh?
See reply above.
> If I want a rounded 'a' in Latin, I am required to
> select a font with
> such a design. Similar for a z or a J with descender, or a
> low-striked q. I
> do not expect to be forced to use the "alternative"
> codepoints, that have
> been added for special purposes, like U+0251 or U+0292, for
> an illustrative
> use where I do NOT want to add specific meaning.
>
> The difference here is that you are saying changing a
Will you please stop putting words in my mount!
> z-shaped 'a' to a
> rounded one (etc.) is *not* a spelling change, while writing
> the i matra in
> one or other place *is*. My wild guess is that some Indians may see it
> exactly reversed...
Some characters do have overlapping glyph chapes. However:
*You* are saying that there are two "camps" (your word) for at least one
of the Indic scripts as to how to display some letters. That sounds very
much like a difference worthy of more than a font change. Likewise for
the changes in Indic writing that are referred to as "old orthography" vs.
"new orthograpy"; they are even CALLED spell changes, why not treat
them as such then?
> It is certainly such a difference (not purely aesthetic, I mean). See
> attached image.
I think that difference may be worthy of at least a ZWJ/ZWNJ...
> Should emphasis be recorded as different Unicode codepoints?
> My reading was it should not...
No, and I did not say that.
...
> The best I can find is the acknowledgement (in the Indic OpenType
> specifications) that there is a need to distinguish two
> genuinely different
> "styles" in Uniscribe and related, one named "old style"
> encoded MAL as
> "language system", the other "reformed" encoded MLR.
That does not seem (to me) to be anywhere near the ideal way of
dealing with this.
/kent k
This archive was generated by hypermail 2.1.5 : Tue Mar 28 2006 - 16:07:55 CST