From: John Cowan (cowan@mercury.ccil.org)
Date: Fri Oct 17 2003 - 07:35:53 CST
Jill Ramonsky scripsit:
> Aha. Then at least we agree on something. An 0x110000 character space is
> not big enough for everyTHING.
You persist in misunderstanding. Suppose I came along and told you
I wanted to create a Unicode codepoint for each word in every language
on Earth. Would you blithely allocate me a 24-billion-codepoint
private space? And then my friend comes along and wants to do the
same, but he can't use my encoding because he relies on binary
ordering and he needs to get the languages grouped in alphabetical
order, whereas I sort them by language family. Boom, another 24 billion
codepoints gone. Now comes someone else who figures that 64 x 64 resolution
is good enough for representing glyphs, and wants a codepoint for each
possible glyph. That's 2^64^2, or
10443888814131525066917527107166243825799642490473837803842334832839
53907971557456848826811934997558340890106714439262837987573438185793
60726323608785136527794595697654370999834036159013438371831442807001
18559462263763188393977127456723346843445866174968079087058037040712
84048740118609114467977783598029006686938976881787785946905630190260
94059957945343282346930302669644305902501597239986771421554169383555
98852914863182379144344967340878118726394964751001890413490084170616
75093668333850551032972088269550769983616369411933015213796825837188
09183365675122131849284636812555022599830041234478486259567449219461
70238065059132456108257318353800876086221028342701976982023131690176
78006675195485079921636419370285375124784014907159135459982790513399
61155179427110683113409058427288427979155484978295432353451706522326
90613949059876930021229633956877828789484406160074129456749198230505
71642377154816321380631045902916136926708342856440730447899971901781
46576347322385026725305989979599609079946920177462481771844986745565
92501783290704731194331655508075682218465717463732968849128195203174
57002440926616910874148385078411929804522981857338977648103126085903
00130241346718972667321649151113160292078173803343609024380470834040
3154190336 more codepoints gone. We aren't going to run out of
integers, of course, but we will quickly run out of money, brains,
and time.
Or we can say that the purpose of the Unicode Standard is to encode
characters used for computer (and a fortiori computer-moderated
human) interchange of text.
> In that case, I would argue that, in order to provide a big enough
> character space for everything, IF twenty-one bits is not enough THEN we
> should use more bits.
21 bits is plenty. Not everything that *can* be fit into that space
should be.
> You could argue that that's what the private use area is for.
Exactly.
> I would
> argue that codepoints above 0x10FFFF could be considered as just another
> private use area ... only somewhat larger. So large, in fact, that you
> need never see a clash, ever.
Only if you are willing to deal with infinite precision integers, as
I do above.
-- Eric Raymond is the Margaret Mead John Cowan of the Open Source movement. jcowan@reutershealth.com --Lloyd A. Conway, http://www.ccil.org/~cowan amazon.com review http://www.reutershealth.com
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST