Re: UTF-7: help me understand

From: Deborah Goldsmith (goldsmith@apple.com)
Date: Wed Apr 12 2000 - 13:47:07 EDT


on 4/11/2000 7:44 PM, Mike Brown <mbrown@corp.webb.net> wrote:

> Here is what I am trying:
>
> 1. U+2022 = hex 2022
> 2. hex 2022 = binary octets 00100000 00100010
> 3. those octets are divided into sextets with zero-padding out to 24 bits as
> per the RFCs. I'll use "o" here to represent the 0's used for padding:
> 001000 000010 0010oo oooooo
> 4. the scalar values of these octets: 8 2 8 0
> 5. those values are indexes into the array of [encoded] characters in the
> Base64 alphabet: I C I A
>
> So I don't see how "+ICI" is well-formed and "+ICIA" isn't. I also don't see
> how you could ever have less than 4 characters in a Modified Base64
> representation of a single Unicode character.
>
> What am I missing?

You are confusing octets in the input stream with output Base64 characters.
The RFC does not say an even number of Base64 characters is required, only
an even number of *input* octets (i.e., UTF-16). The RFC also doesn't say
that the output is padded to an octet boundary; it's padded to a Base64
boundary.

So, to use your example:

1. U+2022
2. binary 00100000 00100010
3. Divide into sextets, padding to a *sextet* boundary: 001000 000010 0010oo
4. Scalar values 8 2 8
5. Base64: I C I

Does that make it more clear?

Deborah Goldsmith
Manager, International Toolbox Group
Apple Computer, Inc.
goldsmith@apple.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT