A 12:52 07/06/97 -0700, Martin Dürst a écrit :
[Rick McGowan] :
>> > Is there a requirement that ACAP server side program store data in UTF-8?
[Chris Newman] :
>> Not strictly, but all human readable text strings in the protocol are
>> UTF-8. And the comparator functions will apply to UTF-8.
[Martin] :
>The comparator functions, so called ORDERINGS, can do whatever they
>want. They are an interesting and probably very valuable concept
>of ACAP. But they will involve a lot of work.
>It should be expected that ORDERINGS are described in terms of
>characters, and not in terms of UTF-8 bytes, because that's
>the appropriate abstraction level. How they are implemented
>and applied internally is not very relevant, but I guess I would
>prefer working from UTF-16. Maybe Alain LaBonté has some comments?
[Alain] :
Ordering shall not be dependent on coding. Ordering, as per ISO/IEC CD
14651, is defined on characters, not on binary coding. We use UCS IDS
(form: Uxxxx) to identify these characters because the UCS is the richest
character set we know, but these UCS IDS can be implemented using any
coding under the hood (the simpler and the least storage-demanding the
better, though!)
State-of-the-art string ordering, search or comparison should totally
decouple these string operations from coding. Ideally messages should also
be stored in a code-independent way (always store using the richest and
simplest UCS coding would be a strong recommendation if you have the
choice, imho).
Please have a look at ISO/IEC CD 14651 by contacting your national member
body of ISO. It defines an API redefining the string compare functions to
solve these problems in a clean way.
The Final Committee Draft (FCD) of ISO/IEC 14651 will be ready for
international ballot in October 1997 (the current ISO Committee Document is
technically stable since a while and was voted postively by 75% of voting
members but the numerous comments received call for a new version before we
have an international standard; the main differneces will be what to be
made mandatory versus what will be only recommended). The independence from
character coding is a must which makes consensus, of course. It should be
obvious, but apparently it is not for everybody.
Alain LaBonté
Paris
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT