From: Jony Rosenne (rosennej@qsm.co.il)
Date: Tue Dec 21 2004 - 00:09:31 CST
Since this is not a Hebrew issue, I am copying the general list.
The issue is how should Unicode address disunification of an encoded
character, in the situation where some users (probably most) do not make the
distinction and some do.
I think that in such cases the UTC should address the transition and the
needs of those users that do not make the distinction. We now have a
sizeable Unicode legacy. Gone are the days when legacy was just pre-Unicode
encodings, and the UTC should recognize this change.
My proposal was and is that we need three characters: one for the
"ambiguous", existing character, and two for the distinct ones. Dean is
making the same proposal.
Unicode is a character standard, rather than a glyph collection, and should
take a character oriented approach, rather than the glyph centric approach
advocated by, for example, Michael:
> [mailto:hebrew-bounce@unicode.org] On Behalf Of Michael Everson
> Sent: Monday, December 20, 2004 11:26 PM
> To: hebrew@unicode.org
> Subject: [hebrew] Re: atnah hafukh
....
> It's a disunification scenario: There are two distinct characters,
> with different glyphs, in orthographies which make the distinction.
> In other orthographies which do not make the distinction, it doesn't
> necessarily matter what the glyph is.
Firstly, the glyph does matter, and secondly, the characters necessarily do
matter.
Jony
> -----Original Message-----
> From: hebrew-bounce@unicode.org
> [mailto:hebrew-bounce@unicode.org] On Behalf Of Mark E. Shoulson
> Sent: Tuesday, December 21, 2004 5:58 AM
> To: Dean Snyder
> Cc: hebrew@unicode.org
> Subject: [hebrew] Re: atnah hafukh
>
>
> Dean Snyder wrote:
>
> >We should be encoding the maximal union of all character mergers (and
> >splits). To put it in other words, we should be taking a synchronic,
> >atomistic view on intra-script character repertoire and not
> a diachronic,
> >collapsing one.
> >
> >
> You did suggest something like this during one of the various Hebrew
> character debates. But it doesn't hold up well in general. By that
> logic, we also now need to encode LATIN LETTER U OR V, LATIN
> LETTER I OR
> J (both in CAPITAL and SMALL versions), plus LATIN SMALL
> LETTER LONG OR
> SHORT S (though we could probably manage to use just U+0073
> for that and
> encode SHORT S separately). But I don't think anyone would
> want such a
> confusing state of affairs. Spelling things right is hard
> enough when
> there's only *one* choice for each letter!
The example isn't relevant. These disunifications are very old - you could
have added C/G - and the I and U are commonly used for the ambiguous
characters.
Jony
>
> ~mark
>
>
>
>
This archive was generated by hypermail 2.1.5 : Tue Dec 21 2004 - 00:11:51 CST