Re: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)

From: vunzndi@vfemail.net
Date: Sat Oct 27 2007 - 19:53:57 CDT

Next message: Anto'nio Martins-Tuva'lkin: "Afaka script"

Previous message: Michael Maxwell: "RE: thorn vs. y or th, eth and other similar letters/signs"
In reply to: Ed Trager: "Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Next in thread: John H. Jenkins: "Re: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dear Ed,

rather than 'competing' system this could in fact be a complimentary system.

Such ideas are being considered for at least two research projects
into unencoded characters that I know of. In fact, one might say,
Wenlin's CDL is an example of such a system.

Reagrds
John

Quoting Ed Trager <ed.trager@gmail.com>:

> Hi, everyone,
>
> Although a component-based system of encoding Han ideographs clearly
> did not happen --and is not going to happen-- in the Unicode Standard,
> there is no reason why such a system and standard could not be now
> devised --along with reference implementations-- by an enterprising
> community of people worldwide interested in creating a new, possibly
> competing, and certainly less-limiting future standard for the
> encoding of textual information using Han ideographs.
>
> One can rather easily imagine an Open Source-style project which would
> set out to define a new and independent standard for encoding Han
> ideographs based on their components and the relative positioning of
> those components.
>
> Any ideographs so encoded which map to ideographs currently encoded in
> Unicode could simply be rendered using existing Unicode CJK fonts
> which already contain the relevant "precomposed" glyphs.
>
> As for those ideographs not yet encoded in Unicode, or those rare
> historical or modern oddities and variants which will never be encoded
> in Unicode, such a system would need to provide a "composing engine"
> capable of doing at least a half-decent job at composing ideographs
> from the set of base components. Writing such an engine would be a
> great challenge, which might make it even more likely to actually
> happen, as smart people everywhere on the planet generally enjoy a
> good challenge :-) .
>
> Such a "composing engine" could eventually be tied into existing or
> future text layout and font rasterizing engines, thus allowing
> noodle-eaters everywhere to be able to write about how tasty that dish
> of "biang2 biang2" noodles* they had yesterday was, or parents to name
> their cute babies using uniquely cute ideographs invented by
> themselves, or enterprising marketeers to gain marketshare by
> inventing new ideographs for their "As Seen On TV" products.
>
> Of course there would be many important real-world and scholarly
> applications if such a standard and system existed too. :-)
>
> (* http://en.wikipedia.org/wiki/Biang_biang_noodles )
>
> -- Ed Trager
>
>> On Oct 25, 2007, at 11:41 PM, vunzndi@vfemail.net wrote:
>>
>> An even more effcient solution as far as code points, would have
>> been to encode the components of Chinese characters, not precomposed
>> charcters, this would take up over 10 thousand code points to encode
>> the current 70 thousand unicode charcters, and include over 80% of
>> all CJKV submissions. In this case new submissions would be
>> resticted to new components. This way all cjkv would be in the BMP.
>>
>
> On 10/27/07, vunzndi@vfemail.net <vunzndi@vfemail.net> wrote:
>> Dear Gerrit,
>>
>> IMHO you are correct, the biggest obstacle was not technical, but
>> other factors.
>>
>> John
>>
>> Quoting Gerrit Sangel <z0idberg@gmx.de>:
>>
>> > Excuse me if I am wrong, but according to Wikipedia, the original Cangjie
>> > method mastered this in the 80s or so. And I do not think the computer at
>> > that time were really sophisticated.
>> >
>> > Could it not have been solved like the ligatures in TeX? I mean,
>> TeX masters
>> > some features other apps still cannot do now.
>> >
>> > I think, a possibility would have been to store the text like ?
>> > (U+5973) and ?
>> > (U+99AC) and generate ? (U+5ABD) via some kind of ligatures. This
>> could then
>> > be stored in the font, which describes that if ? is followed by ? and a
>> > character for ?next character? it should generate ?.
>> >
>> > This could have then spanned the ordinary CJK range, but if some kind
>> > of ?unknown? character is typed in, it could still be stored
>> (maybe in a more
>> > inferior quality in display, but still it would not have needed a code
>> > point).
>> >
>> > Regards
>> > Gerrit Sangel
>> >
>> > Am Freitag 26 Oktober 2007 schrieb John H. Jenkins:
>> >> it would
>> >> have required technical support beyond the abilities of then-current
>> >> systems, it would have made East Asian texts take even *more* space
>> >> than they do now and made them more difficult to process.
>> >
>
>

-------------------------------------------------
This message sent through Virus Free Email
http://www.vfemail.net

Next message: Anto'nio Martins-Tuva'lkin: "Afaka script"
Previous message: Michael Maxwell: "RE: thorn vs. y or th, eth and other similar letters/signs"
In reply to: Ed Trager: "Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Next in thread: John H. Jenkins: "Re: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Oct 27 2007 - 19:58:05 CDT