Re: Tamil 0BB3 and 0BD7

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 10 2003 - 08:19:43 EST

Next message: Philippe Verdy: "Re: Hexadecimal digits?"

Previous message: Philippe Verdy: "Re: ZWJ, ZWNJ, CGJ and combination"
In reply to: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"
Next in thread: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"
Reply: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Kent Karlsson" <kentk@cs.chalmers.se>
> > From: "Kent Karlsson" <kentk@cs.chalmers.se>
> >
> > > The Indic "lenght marks" should be seen as encoding mistakes.
> >
> > Could they be documented officially as deprecated in favor of another
> > character, by assigning them a compatibility decomposition
> > mapping (I mean with <compat>XXXX in the UCD)?
>
> By now you should know perfectly well that they cannot.
>
> The decompositions cannot be changed.

Is it true for compatibility decomposition? When I look at the Unicode
stability policy, I thought it only meant the canonical mappings, or the
fact that a canonical mapping cannot be changed to a compatibility mapping
or the reverse, and that this mapping must remain stable.

Under point #4, we have this sentence:

    Particularly in the situation where the Unicode Standard first
    encodes less-well documented characters and scripts, the
    exact character properties and behavior initially may not be
    well known.(...)

This is our case.

    (...)As more experience is gathered in implementing the characters,
    adjustments in the properties may become necessary. Examples
    of such properties include, but are not limited to, the following:
      * General category
      * Case mappings
      * Bidi properties
      * Compatibility decomposition tags (e.g. <font> vs. <compat>)
      * Representative glyphs

So, as the change in AU length mark does not affect its identity,
the compatibility decomposition tag may be added.

May be I'm wrong here. But this does not forbid Unicode to say
that length marks should be deprecated like some other characters.
Of course this would require an equivalent update in the ISCII
standard from which these characters were coded: what if ISCII
says now that length marks are deprecated for use in a given list
of scripts where it is used? Shouldn't the same happen to Unicode?

Also it would be an interesting mapping for applications which will
be quite scrupulous about effective character identity (notably in
IDNA where it is a security issue: IDNA implementations will probably
need to add this mapping as part of the process for NamePrep...)

> And since these chars are part of the decompositions of actually useful
characters,
> these "length marks" cannot be deprecated or use-discouraged.

With compatibility mappings we don't remove any canonical distinctions, so
the stability of normalized strings is kept (except compatibility
decompositions, which however often removes some distinctions which are not
essential to the character identity)...

Deprecating a character would mean that implementations are encouraged,
wherever possible, to treat legacy texts encoded with length marks
identically with those coded with separate letters. But it does not
constitute a requirement for conformance.

Next message: Philippe Verdy: "Re: Hexadecimal digits?"
Previous message: Philippe Verdy: "Re: ZWJ, ZWNJ, CGJ and combination"
In reply to: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"
Next in thread: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"
Reply: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 09:01:33 EST