Re: Encoding Personal Use Ideographs

From: James Kass (thunder-bird@earthlink.net)
Date: Sun Nov 04 2007 - 00:23:33 CST

Next message: Jukka K. Korpela: "Re: logos, symbols, and ligatures (RE: Encoding Personal Use Ideographs)"

Previous message: James Kass: "Re: logos, symbols, and ligatures (RE: Encoding Personal Use Ideographs)"
Maybe in reply to: James Kass: "Re: Encoding Personal Use Ideographs"
Next in thread: John H. Jenkins: "Re: Encoding Personal Use Ideographs"
Reply: John H. Jenkins: "Re: Encoding Personal Use Ideographs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy wrote,

> James Kass wrote:
> > When the IDS order is top-to-bottom, the appropriate IDC is used (⿱).
> > When the IDS order is bottom-to-top, then correct IDCharacters exist : ⿶
> > and, possibly ⿺.
> > (...)
> > But, 峯 # ⿶夆山. "⿶夆山" would not be a valid IDS.
>
> In 峯, it's clear that using the ⿶ IDC for describing it is not correct.
> The second component 山 is not embedded within the first one, but
> really stacked on top of it. So we can only describe it from top-to-bottom
> using the ⿱ IDC, and then the encoding order in the IDS is reversed,
> and not logical. This is a defect in this case, and one would need to have
> another variant of the ⿱ IDC whose order is reversed from bottom to
> top.

Characters composed of components which are stacked vertically
are always written top-to-bottom. An IDS for any such character
will always match that order. This is not a defect and there would
be no use in IDS for an IDC with reversed bottom to top order.

I apologize if I was unclear.

Incidentally, characters composed of components aligned side-by-side
are always drawn starting with left-most component. The IDS order
follows the written order in this case, as well.

> Using ⿶ or ⿺ for this in the IDS is really a hack: if it preserves the
> logical order, it does not correctly encodes the correct description.
>
> There's only one conclusion: the IDS and the logical ideographs do not
> encode the same thing. The mapping between the two is not one-to-one,
> but often one-to-many, or many-to-one, or many-to-many; when
> these exceptions can count as far as 20% of the existing encoded
> ideographs, we can really conclude that it is best to always avoid
> the existing IDS.
>
> May be it will be possible to have better IDS that allow one-to-one
> mappings, but this won't be possible without adding new IDC
> characters to exhibit more properties: the effective layout of a
> representative character that is uniquely identifiable, even if it
> has several other presentations, that would also have their unique
> IDS; the choice between the IDS to use for the same ideograph
> would be mostly a matter of localization (notably between
> Simplified and Traditional Chinese, but also within Modern
> Japanese, or regional Japanese dialects, or historical variants).
>
> I'm still convince that it will be possible to have a one-to-one
> mapping between a future IDS standard and all ideographs,
> if the mapping incorporates locale selectors: this locale selector
> would allow to select which IDS is representative of a given
> ideograph, which other IDS are considered equivalent, and
> which other IDSs are equivalent in one locale but not in another
> (so that the distinctive subsets would require the encoding of
> variant selectors for these ideographs, to disambiguate the
> cases).
>
> In fact I do think that the only need for registering variants
> for ideographs, is to allow distinctions between groups of possible
> IDS-represented glyphs. Other variants that are only
> typographical don't need to be registered, as long as there's no
> distinction in some CJKV language or dialect.

Experts are studying the ideographic description characters in
an effort to correct any deficiencies. Likewise, experts are
studying character components which are not yet encoded
as single characters in Unicode.

Leaving locale issues aside, the reason given for registering
variation sequences for CJK is to give users the option of
preserving variant forms in plain text of items which would
otherwise be unified.

If IDS use would accomodate roughly 80% of CJK characters,
and if Unicode allows applications to form glyphs for IDSequences,
and if users need to represent as-yet-unencoded or never-to-be-
encoded "characters" right now in plain text, is there a problem
in using IDSequences to do so?

If people seek a compositional model for forming Chinese characters
in computer text, and one exists in the form of IDS (however
imperfect), is there anything wrong with using IDS for the
80% of the cases which IDS can cover?

And, if IDS are used in this fashion, would the pressure to encode
potentially tens of thousands of more ideographs be lessened? In
other words, could 80% of as-yet-unencoded characters be covered
with IDS and never need to be encoded at all, leaving only 20% which
would have to be assigned code points? And, likewise, could 80%
of future proposals for CJK variation sequences be handled well
with IDSequences?

Best regards,

James Kass

Next message: Jukka K. Korpela: "Re: logos, symbols, and ligatures (RE: Encoding Personal Use Ideographs)"
Previous message: James Kass: "Re: logos, symbols, and ligatures (RE: Encoding Personal Use Ideographs)"
Maybe in reply to: James Kass: "Re: Encoding Personal Use Ideographs"
Next in thread: John H. Jenkins: "Re: Encoding Personal Use Ideographs"
Reply: John H. Jenkins: "Re: Encoding Personal Use Ideographs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Nov 04 2007 - 00:27:54 CST