Re: Devanagari question

From: Marco Cimarosti (marco.cimarosti@europe.com)
Date: Tue Nov 14 2000 - 09:47:23 EST

Next message: D.V. Henkel-Wallace: "OT: Devanagari question"
Previous message: Brendan Murray/DUB/Lotus: "Re: A very basic question about Big5/x-Jis/ Unicode...."
Maybe in reply to: James E. Agenbroad: "Devanagari question"
Next in thread: Michael \(michka\) Kaplan: "Re: Devanagari question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Antoine Leca wrote:
> Marco Cimarosti wrote:
> >
> > I think that the original idea behind having combining
> marks in Unicode was
> > that *any* combination of base + diacritic should be permitted,
>
> The fact that it is permitted (as I said, they "are not prohibited")
> does not per se give them any sense...
> This was my point, but I was not clear enough.

Your point was clear, and your statement is certainly true: there are
millions possible combinations of base-character + accent and, clearly, most
of them are meaningless.

But my point was: not even Mr. Ethnologue himself knows exactly *which*
combinations are meaningful, in all orthographic system. And, clearly, no
one can figure out which combinations may become meaningful in the *future*
-- e.g. when a previously unwritten language gets its orthography, or when
the spelling of an already written language gets changed.

So, it makes sense -- and is probably economically worth -- to have a
generalized mechanism to render virtually any combination that could arise.

> > and be handled decently by rendering engines.
>
> The question here is the meaning of "decently".

Sorry for my sloppy expression. A better term would have been "readably".
I.e., good enough to be accepted and understood by a human reader.

> I beg your pardon, but as the programmer of a rendering
> engine, I cannot
> agree that I should spend hours and days, and furthermore
> adding megabytes
> of code, to render "decently" combinations like digits + accents (by
> decently, I mean I should check if the glyph for the digit
> have ascender
> above x-height, or being of narrower width, and then adjust
> the position of
> the diacritic accordingly; similarly, adjusting the descender
> position of the
> Nagari virama according to the descender depth of a preceding
> "g" or "j" or "y".)

If you put it this way, I agree that it definitely doesn't make sense to
spend a single minute of your life, or a single byte in your computer, to
support crazy combinations like <Latin y + Devanagari virama>.

But try and restate the same thing with different words, and it might sound
quite different.

Call it "providing a general solution to a common problem", and you see
that, by implementing such a solution *once for all*, you could end up
spending *less* time and resources than is required to develop (and fix, and
extend, and redesign, and explain...) an ad-hoc solution for every (class
of) combination(s).

Moreover, Mark Davis already commented about some exaggeration here, about
the complexity of the task and the memory requirements.

> At the contrary, I believe that when a combination is not
> expected, the
> renderer should have a very basic and straightforward
> behaviour, and just
> "print" the default glyphs in order, with overstriking when
> the second glyph is a combining mark.

Overstriking is not that bad, as a first approximation!

The next step is positioning the diacritic sign more or less outward, in
order not to collide with the base sign. When you have this, you have a
generalized solution -- any further enhancement is more esthetic than
substantial.

Mark reminded us that these two techniques are explicitly described in the
Unicode standard as possible implementation strategies.

Overstriking is clearly a poor solution, but viable in many cases.
Contextual positioning of accents is more complex, but it certainly doesn't
require years of development or megabytes of RAM!

> Doing something more complex, in addition to be IMHO
> a complete lost of time for both the programmer and the users
> (to load unusued code), is also likely to give some users the
> idea that using some weird
> combinations are handled this ("clever") way everywhere, thus
> leading to chaos when the datas will be brought elsewhere.

This is where the misconception sits, IMHO. If you spend time to come up a
*general* solution, it is because you will generally use it! However complex
it might have been to develop it, it was worth, because you use it all the
time.

But taking the burden of developing a general solution and only using it for
*weird* cases would indeed be a loss of time -- and an illogical behavior
too (pretty like using different forks to eat meat and fish:-).

What I mean is that, once this is in place, it should be used also (and
primarily) for common combinations like: à, á, â, ã, ä, å, ç, è, é, ê, ë, ñ,
ò, ó, ô, õ, etc.

So, the time that is spent in designing the rendering engine will generously
be repaid by the time saved in designing fonts.

Only in a few exceptional cases fonts may need manually tuned "accented
glyph", for special combinations: ì, í, î, ï, etc. But this is not unique to
accents: even perfectly "spacing" letters have ligatures for special cases:
fi, fl, ff, ffi, ffl, etc.

> > If font designers and d. engines implementers insist in the
> idea that an
> > "accented letter" may be rendered only if an ad-hoc glyph has been
> > anticipated in the font, many minority languages will never
> have a chance of
> > being supported at a reasonable cost.
>
> I never say (nor I hope I implied) such an idea.

I didn't mean you mean that. (I didn't know you implemented a rendering
engine, and I don't know how it works, so I was not referring to you).

I was talking about a certain bias towards precomposed character and glyphs
that *does* exist in the industry.

> Now, insisting that any renderer should align properly any
> diacritic on the
> top (or bottom) middle of the I, M and W glyph, will have for
> net result that nobody will never be able to create any renderer...

As far as the basic requirement of readability is met, I see no problem if
different solutions have different levels of sophistication.

If an application displays accents on "w" slightly too much to the right, I
would call it a perfectly readable implementation (although, if I was a
Welshman, I'd probably consider it very ugly).

> > Less common combinations, used in less known languages, may
> get along with a
> > less-than-perfect rendering -- but *no* rendering at all is
> not acceptable,
>
> Where anyone stated such an idea?

You mean the idea that a total lack of rendering is unacceptable? Or that a
default ("less-than-perfect") rendering can be acceptable for very uncommon
combinations? Both ideas are mine, although I think they are common sense.

_ Marco

______________________________________________
La mia e-mail è ora: My e-mail is now:
>>> marco.cimarostiªeurope.com <<<
(Cambiare "ª" in "@") (Change "ª" to "@")

______________________________________________
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup

Next message: D.V. Henkel-Wallace: "OT: Devanagari question"
Previous message: Brendan Murray/DUB/Lotus: "Re: A very basic question about Big5/x-Jis/ Unicode...."
Maybe in reply to: James E. Agenbroad: "Devanagari question"
Next in thread: Michael \(michka\) Kaplan: "Re: Devanagari question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT