Re: CJK Ideograph Fragments

From: Uriah Eisenstein (uriaheisenstein@gmail.com)
Date: Mon May 10 2010 - 07:32:25 CDT

Next message: Uriah Eisenstein: "Re: CJK Ideograph Fragments"

Previous message: Uriah Eisenstein: "Re: CJK Ideograph Fragments"
In reply to: Asmus Freytag: "Re: CJK Ideograph Fragments"
Next in thread: John H. Jenkins: "Re: CJK Ideograph Fragments"
Reply: John H. Jenkins: "Re: CJK Ideograph Fragments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thank you for the detailed answer, Mr. Freytag, I will consider then
submitting at least an initial proposal (will probably take a few weeks).
I'll try to contact participants in some projects which make use of
character decompositions; although, I need to think if such character
fragments would be useful in themselves for exchange of information, rather
than functioning as convenient building components for other characters.
Is there anywhere I could find the justifications for adding the CJK Radical
Supplement characters, or were these incorporated into Unicode as part of
previously-existing standards?
Also, are the IDSs used internally by the IRG available anywhere public? I
know these are not an official part of the Unicode standard, but they would
make a nice use case :)

Regards,
Uriah

On Sat, May 8, 2010 at 11:40 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote:

> On 5/8/2010 11:44 AM, Uriah Eisenstein wrote:
>
>> Well,
>> I've gone through the policies of submitting new characters and scripts
>> and they don't look encouraging :) But neither do they seem to reject the
>> idea of character fragments out of hand, as opposed to the reverse case -
>> characters which can be expressed using existing characters and combining
>> marks. In fact, the CJK Radicals Supplement block and the Hangul Jamo both
>> contain character fragments, in a way. But I suppose these already existed
>> in national standards rather than introduced by Unicode.
>>
>> In any case, examples I've seen of proposals cite experts and provide font
>> makers, neither of whom I have contact with. So I guess I'll drop it for
>> now, and hope that if someone takes it up I'll see it on the mailing list.
>>
> While a font is ultimately required for a proposal to become adopted, it
> shouldn't be a bar to formally raising the issue for initial consideration.
> Oncesomething is considered potentially acceptable, there's enough time to
> come up with fonts (for the purpose of printing charts) before the
> committees need to vote on final approval. Proposals can take years from
> initial consideration to publication....
>
> Your suggestion was that these fragments need to be enumerated for various
> purposes in software and that having a standard enumeration is beneficial.
> If you can document and support that assertion, I would encourage you to put
> it on record.
>
> Doing so would allow a discussion of whether a standard enumeration is
> indeed useful enough to encur the cost of standardization.
>
> In some ways, this would not be a run-of-the-mill character encoding
> proposal, because you are not asserting that these fragments need encoding
> for the purpose of directly expressing text. While that is the primary
> purpose of character encoding, there are purposes that are ancillary to
> this, that a universal character encoding such as Unicode must encompass.
>
> There is certainly some precedent for character codes that aren't limited
> to the primary purpose I mentioned, but, because they don't represent a
> standard situation, one needs to carefully argue why such uses need to be
> covered by standardization and if so, why doing that as character codes is
> appropriate.
>
> That is different from the more usual task to document that an entity
> occurs in written or printed documents.
>
> The problem is, unless you actually put down all the details in a coherent
> proposal it's hard to judge correctly what the situation is. When you raise
> the question informally, all anyone can tell you is that an exceptional
> request is one that needs exceptional justification, which, while certainly
> correct, doesn't exacatly help you or anyone to evaluate whether your
> proposal would meet the required level and type of justification.
>
> A./
>
>>
>> Thanks,
>> Uriah
>>
>>
>> On Sun, May 2, 2010 at 3:06 PM, Uriah Eisenstein <
>> uriaheisenstein@gmail.com <mailto:uriaheisenstein@gmail.com>> wrote:
>>
>> Not exactly, but I suppose such Hanzi fragments could be sued for
>> similar purposes - e.g. looking up characters by components, where
>> the available components may include non-character fragments. Some
>> fragments may be useful for IME purposes, but probably not all.
>>
>>
>> On Sat, May 1, 2010 at 8:57 PM, Edward Cherlin <echerlin@gmail.com
>> <mailto:echerlin@gmail.com>> wrote:
>>
>> 2010/4/28 John H. Jenkins <jenkins@apple.com
>> <mailto:jenkins@apple.com>>:
>>
>> > No. You could certainly write up a proposal and submit it
>> to the UTC.
>> > Should the UTC feel the idea has merit, it would then move
>> it on to WG2
>> > and/or the IRG.
>> > The main problem here is that there is a very strong desire
>> to limit
>> > ideograph encoding to attested and documentable forms.
>> Anything which does
>> > not exist in actual texts is not likely to be well-regarded.
>>
>> I had the idea some years ago of writing up a proposal to
>> encode the
>> hanzi fragments used in Cangjie Shurufa IMEs. These fragments
>> are used
>> extensively in dozens of howto books on keyboarding in
>> Cangjie. This
>> includes the pieces (mostly real characters, with some
>> radicals) used
>> on keyboard labels, and the common forms they stand for. I
>> didn't get
>> any interest from the Cangjie development community or the
>> authors of
>> a book on Cangjie that I have, so i abandoned the idea.
>>
>> Uriah, is this the sort of thing you have in mind?
>>
>> > Similarly, the
>> > UTC has a strong preference not to encoding anything which
>> isn't in actual
>> > use. Proposals to encode characters because they would be
>> useful if encoded
>> > even though they aren't actually being used right now are
>> generally looked
>> > on with disfavor.
>> >
>> > 在 Apr 28, 2010 12:03 PM 時， Uriah Eisenstein 寫到：
>> >
>> > Hello,
>> > My question is about common components of CJK Ideographs
>> which are not
>> > encoded as independent Han characters (and perhaps indeed
>> aren't). A good
>> > example is the right-hand part of the character 漢 itself:
>> it is a distinct
>> > component appearing in multiple other characters, but is not
>> encoded to the
>> > best of my knowledge. The same goes for the top part of 鳥
>> and 島, the
>> > surrounding part of 與 and 興 and several others. My
>> question is whether there
>> > are any plans or discussions for encoding these fragments in
>> Unicode.
>> >
>> > (I haven't found anything about this in mailing list
>> archives; I did find
>> > statements that Unicode does not intend to provide any
>> decomposition data of
>> > Han characters :) And for good reasons. However, such
>> fragments may well be
>> > useful for third-party software dealing with 漢字 glyph
>> generation, lookup by
>> > components etc.)
>> >
>> > Thanks,
>> > Uriah Eisenstein
>> >
>> >
>>
>>
>>
>> --
>> Edward Mokurai (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) Cherlin
>> Silent Thunder is my name, and Children are my nation.
>> The Cosmos is my dwelling place, the Truth my destination.
>> http://www.earthtreasury.org/
>>
>>
>>
>>
>

Next message: Uriah Eisenstein: "Re: CJK Ideograph Fragments"
Previous message: Uriah Eisenstein: "Re: CJK Ideograph Fragments"
In reply to: Asmus Freytag: "Re: CJK Ideograph Fragments"
Next in thread: John H. Jenkins: "Re: CJK Ideograph Fragments"
Reply: John H. Jenkins: "Re: CJK Ideograph Fragments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon May 10 2010 - 07:35:36 CDT