Re: Proposed Update of UTS #10: Unicode Collation Algorithm

From: Mark Davis ([email protected])
Date: Fri May 16 2003 - 22:36:44 EDT

Next message: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"

Previous message: Michael \(michka\) Kaplan: "Re: John's Own Version of Unicode Conformance, Version 4.0"
In reply to: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Next in thread: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Reply: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Reply: Philippe Verdy: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> To take the same example as I took in my previous email, I don't see
> how S1,S2 and S3 could be sorted S1 < S2 < S3 (instead of S1 < S3 <
S2)
> without contracting the sequence of 'U+1169 (ㅗ:HANGUL JUNGSEONG O)
> U+1163 (ㅑ:HANGUL JUNGSEONG YA)'?
>
> S1: U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) U+1169 (ㅗ:HANGUL JUNGSEONG
O)
> U+11A8 (ㄱ:HANGUL JONGSEONG KIYEOK)
> S2: U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) U+116A (ㅘ:HANGUL JUNGSEONG
WA)
> U+11A8 (ㄱ:HANGUL JONGSEONG KIYEOK)
> S3: U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) U+1169 (ㅗ:HANGUL JUNGSEONG
O)
> U+1163 (ㅑ:HANGUL JUNGSEONG YA) U+11A8 (ㄱ:HANGUL JONGSEONG
KIYEOK)
>
> where the primary weights of each Jamo are given as following,
>
> U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) : 301
> U+1161 (ㅏ:HANGUL JUNGSEONG A) : 201
> U+1163 (ㅑ:HANGUL JUNGSEONG YA) : 231
> U+1169 (ㅗ:HANGUL JUNGSEONG O) : 251
> U+116A (ㅘ:HANGUL JUNGSEONG WA) : 255
> U+11A8 (ㄱ:HANGUL JONGSEONG KIYEOK) : 101

Remember, the weights have to be changed so that: T < V < L, so I'll
add 3000 to Ls, 2000 to Vs and 1000 to Ts

S1 => 3301; 2251; 1101; TERM
S2 => 3301; 2255; 1101; TERM
S3 => 3301; 2251; 1231; 1101; TERM

>
>
> > > enumerating all equivalent sequences but just giving primary
weights
> > > to only 'basic' Jamos and requiring a preprocessing in which
cluster
> > > jamos are decomposed into sequences of basic Jamos.
> >
> > Preprocessing (on a string basis) is *deadly* for performance. It
is
> > also not necessary. The weight tables already allow characters to
> > expand, that is what would be done in this case: it is just 1a
above.
>
> I see your point. I didn't pay attention to expansion.
>
> Jungshik
>
>
>
>

Next message: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Previous message: Michael \(michka\) Kaplan: "Re: John's Own Version of Unicode Conformance, Version 4.0"
In reply to: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Next in thread: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Reply: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Reply: Philippe Verdy: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 23:11:51 EDT