From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 10 2003 - 08:19:43 EST
From: "Kent Karlsson" <kentk@cs.chalmers.se>
> > From: "Kent Karlsson" <kentk@cs.chalmers.se>
> >
> > > The Indic "lenght marks" should be seen as encoding mistakes.
> >
> > Could they be documented officially as deprecated in favor of another
> > character, by assigning them a compatibility decomposition
> > mapping (I mean with <compat>XXXX in the UCD)?
>
> By now you should know perfectly well that they cannot.
>
> The decompositions cannot be changed.
Is it true for compatibility decomposition? When I look at the Unicode
stability policy, I thought it only meant the canonical mappings, or the
fact that a canonical mapping cannot be changed to a compatibility mapping
or the reverse, and that this mapping must remain stable.
Under point #4, we have this sentence:
Particularly in the situation where the Unicode Standard first
encodes less-well documented characters and scripts, the
exact character properties and behavior initially may not be
well known.(...)
This is our case.
(...)As more experience is gathered in implementing the characters,
adjustments in the properties may become necessary. Examples
of such properties include, but are not limited to, the following:
* General category
* Case mappings
* Bidi properties
* Compatibility decomposition tags (e.g. <font> vs. <compat>)
* Representative glyphs
So, as the change in AU length mark does not affect its identity,
the compatibility decomposition tag may be added.
May be I'm wrong here. But this does not forbid Unicode to say
that length marks should be deprecated like some other characters.
Of course this would require an equivalent update in the ISCII
standard from which these characters were coded: what if ISCII
says now that length marks are deprecated for use in a given list
of scripts where it is used? Shouldn't the same happen to Unicode?
Also it would be an interesting mapping for applications which will
be quite scrupulous about effective character identity (notably in
IDNA where it is a security issue: IDNA implementations will probably
need to add this mapping as part of the process for NamePrep...)
> And since these chars are part of the decompositions of actually useful
characters,
> these "length marks" cannot be deprecated or use-discouraged.
With compatibility mappings we don't remove any canonical distinctions, so
the stability of normalized strings is kept (except compatibility
decompositions, which however often removes some distinctions which are not
essential to the character identity)...
Deprecating a character would mean that implementations are encouraged,
wherever possible, to treat legacy texts encoded with length marks
identically with those coded with separate letters. But it does not
constitute a requirement for conformance.
This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 09:01:33 EST