RE: Visarga, ardhavisarga and anusvara -- combining marks or not?

From: Peter Constable (petercon@microsoft.com)
Date: Fri Aug 21 2009 - 10:43:35 CDT

  • Next message: Mark Davis ⌛: "Re: Copyleft Symbol"

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Shriramana Sharma

    > To my mind, a combining mark is *usually* (though not always) something that
    > qualifies what is represented by a base character.

    Nothing in Unicode dictates what function in relation to reading or linguistic interpretation a combining mark should have.

    > How can the maker of a rendering engine be expected to foresee this need for a
    > visarga to be placed after a *digit*?

    After seeing this information, they can certainly foresee the need.

    > Normally combining marks are used only for
    > letters, so renderers will support only that.

    Nothing in Unicode says that combining marks are used only for letters. There are other attested cases of marks that combine with digits or symbols.

    > IMHO a character must be called a combining mark only if it can NEVER be used independently.

    I think that's impracticable. Even things that you would certainly call combining marks do get used in isolation when people are writing about them.

    > When a character has been attested as non-spacing or it is spacing but is reordrant or enclosing,
    > it automatically means that it cannot be used separately from its base character. But in the case
    > of characters which are alleged to be spacing combining marks which are displayed in logical order,
    > the authorities should make all possible efforts to ensure that it can never be separated from its
    > base character before encoding it as Mc.

    This argument is more tenable than the previous one.

    > If the only issue here is preventing linebreaks before the visarga, which is a valid need, I admit,

    We don't need to call it a combining mark in order to prevent lines from breaking before it -- except if it weren't acceptable to break before even in worst-case scenarios (really narrow columns where breaking within words cannot be avoided).

    > So I strongly encourage the Unicode authorities to consider, if possible, the changing of the visarga
    > in the Indian scripts to Lo. If it is not allowed under the stability principles, at least attach another
    > annotation to the visarga characters (at least in Devanagari) indicating that sometimes the visarga
    > needs to be placed after digits, so rendering engine makers are advised of the need to allow for that.

    Changing the General Category property would be problematic -- it isn't block by stability policies, but it would have trickle effects that could be quite disruptive in various parts of the standard and in implementations. On the other hand, there is no problem in adding annotations or in creating technical notes (see http://www.unicode.org/notes/) to guide implementers.

    Peter



    This archive was generated by hypermail 2.1.5 : Fri Aug 21 2009 - 10:45:00 CDT