From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Mar 26 2007 - 06:37:37 CST
Andrew West wroite:
>
> On 26/03/07, mpsuzuki@hiroshima-u.ac.jp <mpsuzuki@hiroshima-u.ac.jp>
> wrote:
> >
> > As both fixes are not realistic, I wish if UTS #37 is updated
> > to have additional note to prohibit (not deprecate) the
> > codepoint conversion from CJK Compatibility Ideographs to
> > CJK Unified Ideographs with IVS. How do you think?
> >
>
> I think that Unicode cannot prohibit anyone from applying any
> particular data transformation they like, including codepoint
> conversion from CJK Compatibility Ideographs to CJK Unified Ideographs
> with IVS.
>
> If, for example, I were to write a text editor that allowed the user
> to perform various transformations to a text (e.g. casing, diacritic
> folding, normalization, transliteration conversions, etc.), there is
> nothing that Unicode could say or do to stop me from also adding in a
> facility to convert between CJK Compatibility Ideographs and their
> corresponding CJK Unified Ideograph plus IVS if I so desired. As long
> as my application does not purport not to modify the text, I believe
> that I would remain conformant if I apply pretty much any data
> transformation I like.
False; if you do that, you modify the text; a Unicode-conforming transform
DOES modify the text; the only algorithms that don't transform the text are
not called "transforms", but "forms" (e.g. normalization forms and UTF's),
or
encodings (e.g. CCS, CES, TES... Most of them however do require a transform
before mapping from one to the other or to a UTF).
What you describe is not different from other "transforms" like:
* removing ignorable characters
* case foldings
* ...
Unicode-conformance for algorithms requires the canonical equivalence (or
equality) of the results between two implementations or instances of the
algorithms, given any two canically equivalent inputs.
What you are doing is not implying automatically such preservation of
canonical equivalence of the output; it may be true if your implementation
respects the contract, but your description could correspond to the
following description, which is NOT a conforming process:
* change a CJK Compatibility Ideograph to its corresponding CJK Unified
Ideograph plus IVS;
* change a CJK Unified Ideograph plus IVS to the next higher CJK Unified
Ideograph plus IVS, if there's one, or to the CJK Compatibility Ideograph;
* keep the other characters unchanged;
Apparently it seems conforming, but consider the case where there are
diacritics within or just after the sub-sequences being modified; and
consider how default grapheme clusters are delimited.
This archive was generated by hypermail 2.1.5 : Mon Mar 26 2007 - 06:40:44 CST