Characters <128 take one byte.
Characters <2048 take two bytes.
All others in the 64K normal range take three bytes each.
There are provisions for characters above 2,097,152 to use three bytes, but
normally unicode is only up to 64K. However when additional space is used,
it still uses two bytes up to the 2,097,152 point.
Using Character Agent from Bjondi, we find that 2048 (hex 0800) is in the
middle of the arabic characters. Below this are things like hebrew,
armenian, cyrillic, greek, and some other misc stuff. All the asian sets are
above this point.
----- Original Message -----
From: "Sarasvati" <root@unicode.org>
To: "Unicode List" <unicode@unicode.org>
Sent: Friday, March 31, 2000 9:57 AM
Subject: Largest character
> Forwarding for Samir...
>
> > Subject: Largest character
> > Date: Fri, 31 Mar 2000 10:16:33 +0530
> >
> > Hi,
> > Which are those languages whose characters requires maximum number
of
> > bytes to store using UTF 8?
> >
> > - Samir Mehrotra,
>
>
>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT