RE: Words, Characters and Etymology

From: Marco.Cimarosti@icl.com
Date: Fri Apr 28 2000 - 11:59:57 EDT

Next message: Magda Danish (Unicode): "FW: Unicode konversion"
Previous message: Steven Brent: "Words, Characters and Etymology"
Maybe in reply to: Steven Brent: "Words, Characters and Etymology"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> These are all demonstrably related forms, but if we were
> encoding European
> codesets as whole words (like Han characters) rather than as sets of
> letters, I would not for a second propose using a single code
> point (eg. ?)
> for 'hound' and 'Hund' or even the related 'chien', saying
> these represent
> the same word which is rendered somewhat differently in each
> country!

Well why not? They all bark and come from PIE *kuon :-)
About using ideographs to write European languages, see
http://zompist.com/yingzi/yingzi.htm

</NOT>

> We
> would certainly treat them as distinct codes, so why treat
> Han glyphs any
> differently? Are we trying to be etymologists here?

My understanding is that the "problem" is much smaller than in your
examples. If Unicode Han Unifications would had made such exaggerated
decisions, *nobody* would be supporting it, apart maybe a few mad
etymologists.

The kind of differences that Unicode decided to ignore (or, better, to
"unify") are rather subtle typographic details, such as the following
examples:

- the dot on top of the "roof" radical (? = U+2F27 = U+5B80) is slanted to
the left in some fonts, but it is straight in other fonts.

- the "walk" radical (? = U+2FA1 = U+8FB5), when part of an ideograph, has a
3-stroke shape in some fonts (? = U+2ECC, with a single dot on top), but a
4-stroke shape in some other fonts (? = U+2ECD, with two dots on top).

- the "grass" component (? = U+2F8B = U+8278), when part of an ideograph,
has 4 strokes in some fonts (? = U+2EBF), but only 3 strokes in other fonts
(? = U+2EBE, where the two adjacent horizontal lines are merged).

The degree of acceptability of these graphical variations varies from
country to country and from person to person, reactions ranging from
complete indifference to relentless fury.

_ Marco

Next message: Magda Danish (Unicode): "FW: Unicode konversion"
Previous message: Steven Brent: "Words, Characters and Etymology"
Maybe in reply to: Steven Brent: "Words, Characters and Etymology"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT