RE: Unicode Search Engines

From: Marco Cimarosti ([email protected])
Date: Wed Feb 20 2002 - 14:04:32 EST

Previous message: John Cowan: "Re: Unicode Search Engines"
Maybe in reply to: Marco Cimarosti: "RE: Unicode Search Engines"
Next in thread: Mark Davis: "Re: Unicode Search Engines"
Next in thread: Marco Cimarosti: "RE: Unicode Search Engines"
Reply: Mark Davis: "Re: Unicode Search Engines"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Cowan wrote:
> Documents not in UTF-* are normalized by definition, unless it is
> *impossible* to convert them to normalized Unicode (typically
> because they contain characters not yet present in Unicode).

Is that true for all encodings?

E.g., ISCII 0xCF + 0xE9 (LETTER RA + SIGN NUKTA) corresponds to Unicode
U0930 + U093C (DEVANAGARI LETTER RA + DEVANAGARI SIGN NUKTA), which is not
NFC: it should be U0931 (DEVANAGARI LETTER RRA).

What should the recipient to when it receives such an ISCII sequence? Refuse
it because it is not normalized (ISCII itself also contains 0xD0, LETTER
RRA), or "fix" it while converting it to Unicode?

_ Marco

Previous message: John Cowan: "Re: Unicode Search Engines"
Maybe in reply to: Marco Cimarosti: "RE: Unicode Search Engines"
Next in thread: Mark Davis: "Re: Unicode Search Engines"
Next in thread: Marco Cimarosti: "RE: Unicode Search Engines"
Reply: Mark Davis: "Re: Unicode Search Engines"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Feb 20 2002 - 14:02:10 EST