Re: Unicode in VFAT file system

From: John Cowan (jcowan@reutershealth.com)
Date: Fri Jul 21 2000 - 13:24:02 EDT


Peter_Constable@sil.org wrote:

> Why does it say there are three varieties when a 16-bit datum can only be
> serialised in two orders?

The simplest way to think about it is to remember that a MIME charset is meant
to provide *minimal* information for the receiver to convert bytes into
characters. If the receiver gets FF FE 01 02, then it *must* be interpreted
as follows depending on the charset:

        UTF-16: U+0201
        UTF-16BE: U+FFFE U+0102
        UTF-16LE: U+FEFF U+0201

For any given byte sequence, at most two of the charsets produce a meaningful
sequence of characters, since U+FFFE is not a character, but that doesn't
affect charset decoding.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT