Re: Visarga, ardhavisarga and anusvara -- combining marks or not?

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Aug 25 2009 - 17:00:44 CDT

Next message: Shriramana Sharma: "Document on usage of Reph in Gurmukhi and Telugu"

Previous message: Marcin 'Qrczak' Kowalczyk: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
In reply to: Kenneth Whistler: "Re: Visarga, ardhavisarga and anusvara -- combining marks or not?"
Next in thread: Shriramana Sharma: "Re: Visarga, ardhavisarga and anusvara -- combining marks or not?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 8/25/2009 12:53 PM, Kenneth Whistler wrote:
> Asmus said:
>
>
>> The third approach would leave the actual assignments in
>> place, but achieves the same effect by a highly visible effort to
>> document the improved understanding of what it means
>> for a character to have classification Mc.
>>
>> Unlike the first option, this would not be a case-by-case
>> annotation of a few problematic characters in diverse
>> script chapters, but would have to be more up-front.
>>
>
> And I would second this third approach. ;-)
>
The success of this third approach depends on it being able to rattle
people's naive understanding that equates all combining marks with
graphically combining characters, and more specifically treating any of
the gc=Mc characters as if they were non-spacing marks with glyphs of
positive advance width.

The current status, nibbling at the margins, has not been successful,
otherwise there wouldn't be as many problems.

In that context, getting information on *specific* characters is not
contributing to the proposed solution, because the problem is generic in
nature. It's just more "nibbling at the margins". (However useful it is
otherwise in documenting the scripts specifics).

What I was hoping for is that you go beyond seconding this in E-mail and
continue to spearhead a revision of the text, going beyond the
foundation you laid in section 3.6

First, it would be useful to add to D55 the general category (gc=Mc).
Second it would be useful to either mention the sub-types in comments
(split, left side, generic spacing) or to define them at this point with
actual definitions. Then you can put a note there, warning that spacing
marks that aren't special (what I've called "generic") don't combine
with the base character and should be rendered like ordinary spacing
characters of that script.

For example, section 4.1 discusses some sub-types of combining marks, a
short discussion of both the generic non-spacing and the set of generic
spacing combining marks and issues would be useful. I know that 4.1 came
from the desire to address the normalization issues of combining class,
but that's not apparent to the reader - it needs to cover all types of
combining classes and be given cross links to all other descriptions of
combining classes and how to handle them. [This is more important if TUS
is forever online]

Section 5.12 (which is about nonspacing marks) uses the terms combining
mark and nonspacing mark interchangeably. At that point, a pointer to
discussion of *other* types of combining marks, esp. the "spacing"
versions is needed.

Alternatively even a short section "strategies for handling other
combining characters".

Section 7.9 entitled "combining marks" could be more explicit in that
the discussion is only (or primarily) for combining marks of types found
in European alphabetic scripts, and be more forceful and up-front (that
is in the opening section) in mentioning that while some aspects of
combining marks are generic, the rendering rules for other scripts (and
other types of combining marks) are different.

And /or adding a short subsbusection that points to other types of
combining marks (spacing, subjoint, etc, by their type and completes the
cursory overview, so that the section can be read as an introduction to
the topic). Mention of spacing combining characters is especially
apropos in Section 7.9, because it talks about spacing clones of
nonspacing marks, which embodies another use of the word "spacing".

In chapter 9, I note the absence of any "generic" spacing character in
the examples for the rendering rules (the one and only such character
occurs in the example for the bindu, so its own rendering behavior isn't
the one that's discussed).

A new R rule should be added for "generic" spacing marks, that makes
clear that these are laid out just like "Lo".

Ditto for any comparable discussion of other scripts containing
"generic" Mc characters. That's just for starters.

After that is done, it would indeed be useful to document individual
characters.

A./
> It would be very useful to have a written explanation of
> the behavior of visarga and ardhavisarga to help guide
> rendering implementations. Note that there are many
> many extensions for Vedic added in Unicode 5.2, and
> the addition of the ardhavisarga is not the only character
> which implementations will need new information about
> in order to get best display behavior -- but it is
> a good place to start.
>
> Shriramana Sharma's discussion which started this thread,
> shorn of assumptions about what "should" or "should not"
> be a combining mark, and instead focussing on the actual
> display behavior required, could seed such a written
> explanation. It could start existence as a FAQ (or
> set of FAQ entries) or a UTN -- and if it proves helpful,
> then be reworked to incorporate it as appropriate in
> the relevant sections of the standard, if the UTC approves
> heading in that direction.
>
> --Ken
>
>
>
>

Next message: Shriramana Sharma: "Document on usage of Reph in Gurmukhi and Telugu"
Previous message: Marcin 'Qrczak' Kowalczyk: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
In reply to: Kenneth Whistler: "Re: Visarga, ardhavisarga and anusvara -- combining marks or not?"
Next in thread: Shriramana Sharma: "Re: Visarga, ardhavisarga and anusvara -- combining marks or not?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 17:03:02 CDT