RE: Suggestions in Unicode Indic FAQ

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Mon Feb 03 2003 - 12:29:47 EST

Next message: Deborah Goldsmith: "Re: How is glyph shaping done?"

Previous message: Kent Karlsson: "RE: Suggestions in Unicode Indic FAQ"
In reply to: Keyur Shroff: "RE: Suggestions in Unicode Indic FAQ"
Next in thread: Doug Ewell: "Re: Suggestions in Unicode Indic FAQ"
Reply: Doug Ewell: "Re: Suggestions in Unicode Indic FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > No, with proper reordering (and "normal" display mode), the e-matra at
> > the beginning of the second word would appear to be last glyph of the
> > first "word". Similarly, for the second case, the e-matra glyph would
> > have come to the left of the pa. The fluent reader (ok, not me...)
> > would then see those errors anyway, just like I can find spelling
> > errors in Swedish, most often without any kind of special marking. (I'm
> > assuming through-out that reordrant combining characters
> are reordered.)
>
> Illegal sequences

There are no illegal sequences.

> are not reordered as you indicated.

Then that is a problem with the display software you are using.

> Also, as far as I
> know there is no mention of reordering of illegal input sequence (or
> invalid combining mark) in Unicode standard.

Again, there are no "illegal input sequences".

> Consider the last set of glyphs (left-to-right, top-to-bottom) in the
> attached image. It is the rendering effect of illegal input sequence

See above.

> "Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka"
> [U+0915] and without any dotted circle.

Let's see if I understand you. <093F, 0915> is the input. Since
093F is a combining character, one should (not must, but should)
treat this *as if* the input was <0020, 093F, 0915>. Since 093F
is also reordrant, one must reorder it before the preceding base
character (at least, more for consonant clusters), so the output
glyphs would be <<glyph for 0915, space, glyph for 0915>>.
(But your image does not show that.)

> As you might be knowing the correct input
> sequence should be U+0915 followed by U+093F.

That would be a different input (whether that is correct or
not depends on the authors intent).

> In that case the result would
> have been similar to what appears right now.

Similar ONLY if you disregard the space "glyph" that should
have been there.

> (Though some more
> sophisticated font/application may want to replace the
> appearing glyph for
> U+093F to be substituted by some other glyph with proper
> attachment point).

That may be.

> Now there is no way that user can identify this illegal input sequence
> without dotted circle.

Yes, there is. Don't disregard the space "glyph".

> In the worst case even this rendered glyph is
> attached to the character from a class (for example,
> consonant cluster of
> "Ka" "Virama" "Ma") for which the glyph has been designed to
> render with.
> In such case even a fluent reader can not identify the error.
>
> >
> > There are spelling errors, yes. But there are other ways
> of indicating
> > spelling errors, that are (by now) fairly conventional for
> any language
> > (as long as there is an appropriate dictionary installed),
> and that also
> > are more general (in catching more spelling errors) and
> less obtrusive
> > (the author really wants to write it that way, for some reason).
> >
> > > Apparently, Michka used a non-OpenType Bengali Unicode font when
> > > he embedded the fonts into the page. As long as you are looking
> > > at the page on-line, with the embedded fonts, these errors are
> > > invisible.
> > >
> > > It may be typographically horrible. It *should* be
> typographically
> > > horrible in order to illustrate bad sequences clearly.
> >
> > I'd prefer little red wiggly lines under the word, or
> yellow background
> > or some such (just for screen display, not for printing;
> screen grabs
> > not counted). And that for any spelling "error".
>
> Spelling mistakes can be categorized into two different classes.

???

> One
> arising from illegal input sequence (e.g., Vowel Sign E as the first
> character in a word)

There are no illegal input sequences.

> and the other one is legal input sequence with no
> contextual meaning in the dictionary.

A simple spell checker just checks if the word is in the
dictionary or not (without worrying about the context).
That would catch what you call "illegal input sequences" too.

> While indication of the second type
> of mistake is generally used only in sophisticated
> applications like word processor,

Why? There is nothing in principle hindering a spell checker
to be used in a "plain text" editor.

> everyone wants to know the first kind of mistake.

Without a spell checker, but with proper rendering, spelling
errors can be detected by a fluent reader, since they look
different also without any dotted circles. For some ambiguous
Indic cases, like a prefix matra, consonant, postfix matra, all
possible character sequences for them are misspellings (as far
as I know).

> With your
> explanation it seems that even plain text editor is not
> useful at all to identify such common typing mistakes!

Consider English. If I write "nnnn", that may well be a spell error.
Do I deserve to get the rendering of that string to be littered by
dotted circles just because a sequence of four n's "has to" be
a spell error?

/Kent K

> - Keyur

Next message: Deborah Goldsmith: "Re: How is glyph shaping done?"
Previous message: Kent Karlsson: "RE: Suggestions in Unicode Indic FAQ"
In reply to: Keyur Shroff: "RE: Suggestions in Unicode Indic FAQ"
Next in thread: Doug Ewell: "Re: Suggestions in Unicode Indic FAQ"
Reply: Doug Ewell: "Re: Suggestions in Unicode Indic FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Feb 03 2003 - 13:58:41 EST