From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jan 29 2003 - 07:13:56 EST
Keyur Shroff scripsit:
> Sentiments are attached with cultures which may vary from one geographical
> area to another. So when one of the many languages falling under the same
> script dominate the entire encoding for the script, then other group of
> people may feel that their language has not been represented properly in
> the encoding.
Indeed, they may have such beliefs, but those beliefs are based on two
incorrect notions: that what the charts show is normative, and that the
codepoint is the proper unit of processing.
> In Unicode many characters have been given codepoints regardless of the
> fact that the same character could have been rendered through some compose
> mechanism.
In every case this was done for backward compatibility with existing
encodings. No new codepoints of this type will be added in future.
> That is why the text should be normalized to either pre-composed or
> de-composed character sequence before going for further processing in
> operations like searching and sorting.
The collation algorithm makes allowance for these points.
It will be quite typical to tailor the algorithm to take language-specific
rules into account.
> Also, many times processing of text depends on the smallest addressable
> unit of that language. Again as discussed in earlier e-mails this may vary
> from one language to another in the same script. Consider a case when a
> language processor/application wants to count the number of characters in
> some text in order to find number of keystrokes required to input the text.
This will not work without knowledge of the keyboard layout in any case.
To enter Latin-1 characters on the Windows U.S. keyboard requires 5 keystrokes,
but they are represented by one or two Unicode characters.
-- Henry S. Thompson said, / "Syntactic, structural, John Cowan Value constraints we / Express on the fly." jcowan@reutershealth.com Simon St. Laurent: "Your / Incomprehensible http://www.reutershealth.com Abracadabralike / schemas must die!" http://www.ccil.org/~cowan
This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 08:02:11 EST