Re: USV to UTF-8 mapping

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Nov 14 2001 - 17:04:54 EST


Peter_Constable@sil.org wrote:

> ...
> Else if U+0800 <= U <= U+D7FF, or if U+E000 <= U <= U+FFFF, then
> C1 = U \ x1000 + xE0
> C2 = (U mod x1000) \ x40 + x80
> C3 = U mod x40 + x80
> Else if U >= U+FFFF, then

This looks like it includes U+FFFF in two branches.
Well, you catch U+FFFF in the previous condition, but to make it cleaner you should change this to either > U+FFFF or >= U+10000.

> C1 = U \ x40000 + xF0
> C2 = (U mod x40000) \ x1000 + x80
> C3 = (U mod 100016) \ x40 + x80

         C3 = (U mod 0x1000) \ x40 + x80
                     ^^ ^^ (you knew that)

> C4 = U mod x40 + x80
> Else
> Error
> End if

markus



This archive was generated by hypermail 2.1.2 : Wed Nov 14 2001 - 18:15:04 EST