From: Peter Constable (petercon@microsoft.com)
Date: Wed Nov 23 2005 - 09:43:22 CST
By my calculations, both you and Ken have errors in your 4.1 statistics.
Re the BMP: Doing a hand count of Cf characters in TUS4.1, I come up with 33. Not 31, not 35. And I came up with the following counts for graphic characters in Unicode 4.1:
Alphabetics, Symbols: 12,497
Han (URO): 20,924
Han Extension A: 6,582
Han Compatibility: 457
Hangul Syllables: 11,172
Total Graphic characters: 51,642
Re the supplementary planes: My numbers agree with yours.
Overall, then, I believe the correct numbers for TUS4.1 are as follows:
Unicode 4.1:
51642 graphic characters assigned (BMP)
33 format control characters assigned (BMP)
65 control characters assigned (BMP)
6400 private use characters assigned (BMP)
2048 surrogate code points designated (BMP)
34 noncharacter code points designated (BMP)
5314 reserved code points (BMP)
45875 graphic characters assigned (supplementary planes)
105 format characters assigned (supplementary planes)
131068 private use characters assigned (supplementary planes)
32 noncharacter code points designated (supplementary planes)
871496 reserved code points (supplementary planes)
------------------------------------------------------------------
1114112 code points altogether
I haven't looked at 5.0 numbers; let's see if we can agree on 4.1 numbers, though.
Peter Constable
> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
> Behalf Of Andrew West
> Sent: Wednesday, November 23, 2005 4:26 AM
> To: unicode@unicode.org
> Subject: Re: How many characters?
>
> On 22/11/05, Kenneth Whistler <kenw@sybase.com> wrote:
> >
> > Unicode 4.1:
> >
> > 51644 graphic characters assigned (BMP)
> > 31 format control characters assigned (BMP)
> > 65 control characters assigned (BMP)
> > 6400 private use characters assigned (BMP)
> > 2048 surrogate code points designated (BMP)
> > 34 noncharacter code points designated (BMP)
> > 5314 reserved code points (BMP)
> > 45980 graphic characters assigned (supplementary planes)
> > 131068 private use characters assigned (supplementary planes)
> > 32 noncharacter code points designated (supplementary planes)
> > 871496 reserved code points (supplementary planes)
> > ------------------------------------------------------------------
> > 1114112 code points altogether
> >
> > Unicode 5.0:
> >
> > 51986 graphic characters assigned (BMP)
> > 31 format control characters assigned (BMP)
> > 65 control characters assigned (BMP)
> > 6400 private use characters assigned (BMP)
> > 2048 surrogate code points designated (BMP)
> > 34 noncharacter code points designated (BMP)
> > 4972 reserved code points (BMP)
> > 47007 graphic characters assigned (supplementary planes)
> > 131068 private use characters assigned (supplementary planes)
> > 32 noncharacter code points designated (supplementary planes)
> > 870469 reserved code points (supplementary planes)
> > ------------------------------------------------------------------
> > 1114112 code points altogether
> >
>
> Ken may perhaps have forgotten that the 4.0 figures wrongly count five
> format characters as graphic characters, and so after adjusting for
> the longstanding out by two error the 4.1 figures for format
> characters are still out by four due to the change in GC of U+200B to
> Cf in 4.0.1. By my calculations the correct values for 4.1 are:
>
> Unicode 4.1:
>
> 51640 graphic characters assigned (BMP)
> 35 format control characters assigned (BMP)
> 65 control characters assigned (BMP)
> 6400 private use characters assigned (BMP)
> 2048 surrogate code points designated (BMP)
> 34 noncharacter code points designated (BMP)
> 5314 reserved code points (BMP)
> 45875 graphic characters assigned (supplementary planes)
> 105 format characters assigned (supplementary planes)
> 131068 private use characters assigned (supplementary planes)
> 32 noncharacter code points designated (supplementary planes)
> 871496 reserved code points (supplementary planes)
> ------------------------------------------------------------------
> 1114112 code points altogether
>
> Based on the latest publicly available version of the 5.0 UCD data, I
> get the following figures for 5.0. My figures have two less BMP and
> two more SMP characters than Ken's figures, but I haven't
> cross-checked with N2991 yet (N2991 states there are 1,359 new
> characters, but this must be a typo for 1,369), so I'm not sure who's
> correct.
>
> Unicode 5.0:
>
> 51980 graphic characters assigned (BMP)
> 35 format control characters assigned (BMP)
> 65 control characters assigned (BMP)
> 6400 private use characters assigned (BMP)
> 2048 surrogate code points designated (BMP)
> 34 noncharacter code points designated (BMP)
> 4974 reserved code points (BMP)
> 46904 graphic characters assigned (supplementary planes)
> 105 format characters assigned (supplementary planes)
> 131068 private use characters assigned (supplementary planes)
> 32 noncharacter code points designated (supplementary planes)
> 870467 reserved code points (supplementary planes)
> ------------------------------------------------------------------
> 1114112 code points altogether
>
> Andrew
>
This archive was generated by hypermail 2.1.5 : Wed Nov 23 2005 - 09:56:38 CST