Re: Mixing alphabets (was: sorting my CD collection)

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Thu Aug 10 2000 - 16:44:49 EDT


Once again, if collation info is what you want, see

http://www.unicode.org/unicode/reports/tr10/

Beyond that, it is unclear what you are looking for, really. But if you were
to actually read and try to understand that document, I am fairly certain
that one of two things will happen:

1) You will find the answer to your question, or

2) You will be able to frame the question more clearly

I am betting on #1, actually, as the most likely outcome. :-)

michka

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

----- Original Message -----
From: <11digitboy@bolt.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Thursday, August 10, 2000 12:56 PM
Subject: Mixing alphabets (was: sorting my CD collection)

> You have a good point: .... does nu-alpha-tau-alpha-sigma-alpha
> spell "Natasa" or "Natasha"? The Greek letters given
> are obviously an attempt to write "Natasha" in Greek,
> but they romanize to "Natasa".
>
> And a, b, c, d, e, f, g, h, ... HATES a, i, u, e,
> o, ka, ki, ku, ...
>
> Maybe I should just capitalize everything (except
> Georgian? ... not that I have any Georgian CDs, or
> am likely to... I bet few things would be rarer than,
> say, a Georgian female rap CD in the US!!) and from
> there, just sort by codepoint number... no good,
> "Á" would come after "Z"...
>
> Would somebody PLEASE tell me, IN THE DEFAULT UNICODE
> COLLATION ALGORITHM, WHAT COMES AFTER WHAT?! I could
> use a list of Unicode characters in proper collation
> order, with "ties" labeled!!
>
> --
> Robert Lozyniak
> Accusplit pedometer manufactures can go suck eggs
> My page: http://walk.to/11
> 11digitboy@bolt.com - email
> (917) 421-3909 x1133 - voicemail/fax
>
>
>
> ---- Antoine Leca <Antoine.Leca@renault.fr> wrote:
> > Robert Lozyniak wrote:
> > >
> > > How do you sort text with some in Roman and some
> > > in non-Roman alphabets?
> >
> > I never sort texts, only lists of items (words,
> > names, titles, whatever).
> >
> > Depending of the ratios, I see two main solutions:
> >
> > - if Latin is the most current, _and_ only other
> > Greek-
> > derived scripts are used, _and_ the intended audience
> > is proficient enough, I may interspeed the non-Roman
> > letters as if all the Greek-derived alphabets shared
> > a common order (so Greek alpha sorts just after
> > Latin a,
> > Cyrillic ve after Cyrillic be which follows Greek
> > beta
> > which follows Latin b, Greek xi after the o's and
> > before
> > the p's, etc.)
> >
> > - in other cases, I sort the scripts separately.
> >
> >
> > > Currently, I'm just romanizing
> > > everything but I don't know if that is that good.
> >
> > Hmmm. I won't do that. It would take me much too
> > long
> > to find something that begin with beta at the V
> > section,
> > while something that begin with mu+pi at the B
> > section...
> > For Cyrillic, I expect U+0427 to romanize as tcha,
> > and U+0429 as chtcha, and I am not sure you will
> > (or
> > vice-versa).
> >
> > Things are different if you actually translitterate,
> > i.e. if the items are presented in Latin script.
> >
> >
> > > It is probably bad to kanize digits, because
> > they
> > > would sort 1, 9, 5, and so on, or some other
> > mixed-up
> > > order.
> >
> > It is always a problem to sort the digits, anyway.
> > Since they are usually ony a few of them, I believe
> > the
> > best place is the foremost, so the search does
> > not takes
> > too long. But if they are more than a bunch, that
> > is
> > pretty always a brain damage.
> >
> >
> > Antoine
> >
>
> ___________________________________________________________________
> Get your own FREE Bolt Onebox - FREE voicemail, email, and
> fax, all in one place - sign up at http://www.bolt.com
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT