Re: Changing UCA primarly weights (bad idea)

From: Michael Everson (everson@evertype.com)
Date: Fri Jul 09 2004 - 15:25:25 CDT

Next message: Peter Kirk: "Re: Looking for transcription or transliteration standards latin- >arabic"

Previous message: Patrick Andries: "Re: Arabic written in Syriac? Arabic written in Tifinagh?"
In reply to: Mark Davis: "Re: Changing UCA primarly weights (bad idea)"
Next in thread: Mark Davis: "Re: Changing UCA primarly weights (bad idea)"
Reply: Mark Davis: "Re: Changing UCA primarly weights (bad idea)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark, your examples are all of the
run-of-the-mill Scandinavian variety. Trotting
out Polish and Danish doesn't address the issue.
The issue is all the phonetic characters, and
all the African ones (for instance).

> > 1) it destabilizes the default tailorable template of ISO/IEC 14651
> > and the UCA which has been published for some time. Anyone who *has*
> > tailored it would have to do all that work all over again.
>
>You are certainly right that this is not a slam-dunk;

This noun must have been on TV a lot in the US
recently; I have seen it a lot but it remains
obscure, apart from being a basketball reference.
What does it mean? That I am right that the
proposal is not a shoo-in? Or, indeed, that I am
right that it is not a foregone conclusion that
the proposal will be accepted?

>there are reasons for
>and against it. And it may well be that the committee decides against it.

There are two templates, which are synchronized,
and decided about by two committees.

>What we actually did was to put similar letters
>near other letters, *and if their decompositions
>were the same* we interfiled them.

I remember. I was on the committee that helped to decide these things.

>There is, however, little principled difference
>between Å, ¸ , ¼ , Ñ, Ø, ?, and Ô that would
>cause a user to think that the some should be
>interfiled and some should not. In some
>languages these would be seen as "separate
>letters" (e.g. with different primary weights)
>and in others not; but that does not line up in
>any particular way with what is in the UCA. (see
>also comment below).

Those aren't the ones I'm worried about, and they
are not much of a problem. We had principles for
determining "basic letters" and those are what we
used; what I see now is a proposal to change that.

>See http://www.unicode.org/charts/collation/chart_Latin.html for many other
>cases.

Please do. Do you really want all those letters
between "e" and "f" interfiled with "e"? I surely
do not.

> > 3) in discussions elsewhere, Mark has talked about what "most users"
>> "expect" and I found his suggestion to be anglocentric and
>> unsubstantiated.
>
>And I will refrain from saying what I think of your reasoning ability in
>general, although circularity seems to be a particular specialty.

Sweet of you to say.

>I suggest that we stick to the facts instead of ad hominem attacks.

Calling a thing "ad hominem" doesn't make it ad
hominem. It is your suggestion which I
criticized, because it seems very A-to-Z and
alien to the principles which have been in the
template until now.

>For user expectations, check out how foreign words with unusual accents are
>sorted in a variety of languages. I have seen no reason to believe that
>Germans or French or others behave much differently when faced with a letter
>like ø that is not one that they use. The key is whether they would expect
>to see:
>
>a) Interleaved:
>..oa..
>..øb..
>..oz..

You can tailor for this now.

>b) Separate but near:
>..oz..
>..øb..
>..pa..

This is what we have now.

>c) Like a particular language (Danish)
>..yb..
>..øb..

You can tailor for this now.

My point is made here. It is really only in
initial position where this is likely to be
noticed. What I want is the status quo, however.
Leave the template and its principles alone.

>a) Interleaved:
>..oa..
>..öb..
>..oz..

This is what we have now.

>b) Separate but near:
>..oz..
>..öb..
>..pa..

You can tailor for this now.

>c) Like a particular language (Swedish or Phonebook German)
>..yb..
>..öb..
>
>..od..
>..öz..
>..of..

You can tailor for this now.

>More accurately, you believe that the correct behavior occurs.

It is correct for most of the letters which would
be affected by the change you propose. The
overwhelming majority of the
letters-without-diacritics which occur between
the "main A-Z letters" are correctly filed that
way, and would be incorrectly filed if interfiled
with the "main" letters. Is there a discomfort in
what happens between Ø/Ö? Well, that's an
anomaly, right enough but it is well-known and
can easily be tailored for anyone worried about
it. Lumping all the Engs with N or all the Schwas
with E, however, would have only the effect of
making a working template cease to work for the
people who really need those letters: linguists,
speakers of African languages, and so on. The
only people who use the sideways "o" and the top-
and bottom-half "o" are Uralic linguists, and the
template works correctly for them, at least for
those letters.

> > 5) if Mark wants to make a tailoring to interfile all these letters
>> (which can only result in what I describe as "visual seasickess" to
>> any poor users who have to actually read such wordlists.
>
>Again, no evidence.

It was argued years ago in TC304 and WG20. I'm
disheartened to have to reopen the arguments now,
particularly as it affects stability and you
yourself have been a champion for stability.

>Let's look at a particular example, letters based on
>"O". UCA *already* interleaves the list below (UCA O List). Adding John's
>list to that would add only the two elements:

John's list?

> > 6) the Latin alphabet has a lot more than 26 letters in it. In this
>> age of the Universal Character Set, "most users" would do better to
>> get used to this than to be hobbled by older concepts.
>
>I agree with the general principle, but it has
>no bearing on the topic at hand.

It is the key to the principles which are in the template now.

-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com

Next message: Peter Kirk: "Re: Looking for transcription or transliteration standards latin- >arabic"
Previous message: Patrick Andries: "Re: Arabic written in Syriac? Arabic written in Tifinagh?"
In reply to: Mark Davis: "Re: Changing UCA primarly weights (bad idea)"
Next in thread: Mark Davis: "Re: Changing UCA primarly weights (bad idea)"
Reply: Mark Davis: "Re: Changing UCA primarly weights (bad idea)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 15:29:02 CDT