From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Sat Apr 01 2006 - 09:57:35 CST
James Kass wrote:
> > This has nothing to do with font switching at all. Not even
remotely.
> > Font switching must NEVER change apparent spelling.
>
> Reproduce Table 9-11 on page 248 of TUS4.0 in plain text. The table
> illustrates Malayalam Orthographic Reform.
Note the table heading, which says *ORTHOGRAPHIC* (spelling) reform.
What is not said is how the difference in orthography is encoded in
the character stream. Since it is an orthographic reform, there must
be some difference in the character stream. One plausible way is to
use ZWJ/ZWNJ to mark the spelling difference. (Ideally, IMO there
should have been OLD U/NEW U, OLD UU/NEW UU characters, rather
than overloading U and UU with both old and new orthography.)
This has NOTHING to do with font selection. Not at all! (Besides: that
figure does not include AU.)
When a new orthography was announced for German a few years ago,
did you go and make two Latin fonts then, one for the old and one for
the new orthography? I guess (and hope) not... When one for Finnish
started to use š and ž instead of sh and zh, did you go and make a
font that displays sh as š and zh as ž? I guess and hope not.
> > > What has been clear all along is that
> > > U+0D57 should never be included in running text,
> >
> > I don't know where that idea comes from. ...
>
> It comes from TUS4.0 page 249:
> "U+0D57 MALAYALAM AU LENGTH MARK is provided as an encoding for
> the right side of the two-part vowel U+0D4C MALAYALAM
> VOWEL SIGN AU."
>
> So, if I wanted to encode the right side of this two part vowel, as in
Including the modern spelling for the AU vowel (of course).
How did you (and some others) manage to miss the rather clear
statements (in several places) that 0D4C is a **TWO-PART** vowel??
There is no such thing as a sometimes two-part, sometimes one-part
vowel mark (nor should there be).
> a plain text stand-alone representation of it, I'd use
> U+0D57. But there's
> only one MALAYALAM VOWEL SIGN AU *character* in the standard.
That one is for the traditional spelling of the AU vowel. The modern
spelling uses just U+0D57 MALAYALAM AU LENGTH MARK. (The name
of the character does not matter for this.)
So there are two "au vowel sign" characters for Malayalam, one which is
called MALAYALAM VOWEL SIGN AU and the other happens to be called
MALAYALAM AU LENGTH MARK. Granted that the second name does not
catch that character's modern use, just its traditional use.
> However, that same section points to a detailed discussion of these
> two part vowels in the Tamil section. (on page 239) This states that
> for Tamil, the single code point is the preferred form and is the form
> in common use. But, it also says that the single code point
> is equivalent to the string of two code points.
Yes, no problem there. Precomposed versions tend to be preferred when
available (NCF and such).
> There is nothing, far as I can tell, suggesting that the single code
point
> is equivalent to the other single code point. In other words, U+0BCA
is
> equivalent to U+0BC6 plus U+0BBE.
Yes.
> It does not necessarily follow that U+0BCA is equivalent to U+0BBE.
Recte: "not necessarily" -> "not". Indeed, they are not equivalent in
any
sense of the word.
> You seem to be suggesting this equivalence (for Malayalam),
I do not, and they are not. It is the old spelling vs. the modern
spelling.
They are not equivalent. They apparently denote the same phoneme,
but that is not the same as equivalent.
> and if such is the case, it should be plainly stated in the standard.
They are not, and it should not (since they are not).
> Googling KA + AU(vs) gives two pages of hits, mostly purporting
> to be Unicode Malayalam text. Searching KA + AU(lm) gives only
> eight hits, none of which are Malayalam text.
The dire effects of one buggy (ill-made) system.
> Quoting from:
>
>
http://varamozhi.blogspot.com/2004/09/unresolved-issues-in-anjali-unicod
e.html
>> ? should not have the ? symbol in the left (eg: ??). 'AU
length-marker' is just
>> for creating that symbol alone in all kinds of fonts. Or think it
this way - if there
>> a AnjaliNewLipi how would you avoid ? symbol in the left.
Well, that's misleading. And probably a result of being mislead.
>And the response was,
>> Its the responsibility of the unisribe to put the AU marker. font is
not doing
>> anything to put symbols on both sides, itd automatically done by
uniscribe.
>> let me see if i can check that behaviour of uniscribe.
I'm not sure what this tries to say.
Anyhow, using 0D4C is to have exactly the same effect as using <0D46,
0D57>.
Letting 0D4C display as just 0D57 is effectively to say that 0D46 is an
invisible
character. And it is not, it's a *visible* ("graphic") character.
Letting it sometimes
be visible sometimes not is not tenable.
>Quoting from http://varamozhi.blogspot.com/
>(section titled Unicode: Redefining AU length marker U+0D57)
>> Current meaning of the two AU signs are described below:
>>
>> 0D4C MALAYALAM VOWEL SIGN AU
>>
>> • Two part symbol of AU is not used now-a-days.
Recte: "of" -> "for", but otherwise ok.
>> • Could be represented by two part symbol in fonts supporting old
orthography
This is not font dependent, and cannot be when correctly implemented. It
is
ALWAYS two-part. This would be the case even if it didn't have a
canonical
decomposition.
>> • Could be represented by right part alone in fonts supporting new
orthography
That is completely wrong, for a number of reasons; but I'm getting a bit
tired
of having to repeat them again and again.
>>
>> 0D57 MALAYALAM VOWEL SIGN AU - RIGHT PART ALONE
>>
>> • Should not be used as MALAYALAM VOWEL SIGN AU.
The capital letters there are misleading. 0D57 *is* the modern spelling
for AU
in Malayalam.
>> • Represents the right half of 0D4C irrespective of the orthography
supported
>> by the font.
That sentence does not make sense.
>> • Only required when the right part alone need to be specifically
mentioned. eg: in
>> a grammar book.
Or when using the modern spelling for AU in Malayalam.
>> • Common day-to-day texts need not use this symbol at all.
Of course it should.
>> This assignment of meaning to these symbol causes lots of confusion.
And the text you quoted adds to that confusion.
>> Also, it can potentially violate Uniqueness Rule when people
interchangably
"Uniqueness Rule"???
>> use 0D4C and 0D57 to denote AU symbol in new orthography.
0D4C is old orthography for AU, 0D57 ("alone") is modern orthography for
AU.
> The user community, far as I can tell, shuns the notion that U+0D4C
and
> U+0D57 are equivalent.
They are NOT equivalent. They are DIFFERENT spellings of AU in
Malayalam.
/kent k
This archive was generated by hypermail 2.1.5 : Sat Apr 01 2006 - 10:23:14 CST