I was really hoping this was a joke... it didn't hit me it was April 1...
https://en.wikipedia.org/wiki/Plane_(Unicode)
PlaneAllocated code points[note 1]
<https://en.wikipedia.org/wiki/Plane_(Unicode)#cite_note-5>Assigned
characters[note 2]
<https://en.wikipedia.org/wiki/Plane_(Unicode)#cite_note-6>
Totals 280,016 136,755
almost 50% used now.
Though that table omits 655,350 code points as 'unassigned' so it's really
only about 16% (1/6) used
using only 4-byte utf8 or 2 byte utf-16...
and of those, that's only 20(plus or minus a faction of 1) bits?
so a proposal of something a power of 6 larger than that when even just 1
more bit gives another million characters....
https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
I guess if it was encoded every word as a single code point... that
wouldn't be enough seems about 7,716,121 words... so.. 24 bits. plus 1 to
double it for good measure?
*shrug*
On Mon, Apr 2, 2018 at 11:15 AM, William_J_G Overington via Unicode <
unicode_at_unicode.org> wrote:
> Doug Ewell wrote:
>
> > Martin J. Dürst wrote:
>
> >> Please enjoy. Sorry for being late with forwarding, at least in some
> >> parts of the world.
>
> > Unfortunately, we know some folks will look past the humor and use this
> as a springboard for the recurring theme "Yes, what *will* we do when
> Unicode runs out of code points?"
>
> An interesting thing about the document is that it suggests a Unicode code
> point for an individual item of a particular type, what the document terms
> an imoji.
>
> This being beyond what Unicode encodes at present.
>
> I wondered if this could link in some ways to the Internet of Things.
>
> I had never heard of IPv6. Indeed I checked on the Internet to find
> whether that was real. So I have started reading and learning.
>
> It would, in fact, be quite straightforward to encode what the document
> terms 128-bit Unicode characters.
>
> For example, U+FFF8 could be used as a base character and then followed by
> a sequence of 32 tag characters, each of those 32 tag characters being from
> the range
>
> U+E0030 TAG DIGIT ZERO .. U+E0039 TAG DIGIT NINE, U+E0041 TAG LATIN
> CAPITAL LETTER A .. U+E0046 TAG LATIN CAPITAL LETTER F
>
> That is, a newly-defined character from the Specials and then 32 tag
> characters encoding a hexadecimal code point.
>
> Now, if that were called 128-bit Unicode then there could be problems of
> policy, but if it were given another name so that it sits upon a Unicode
> structure so as to provide an application platform that can be manipulated
> using Unicode tools, including existing Unicode interchange formats, and
> display formats for character glyphs, then maybe something useful can be
> produced.
>
> Thus using 128-bit binary numbers in a local computer system and using
> existing Unicode characters for interchange of information between computer
> systems, converting from the one format to the other depending upon the
> needs for local processing and for interchange of information.
>
> Of particular significance is the concept of encoding individual items
> each with its own code point.
>
> Could this be used to relate glyphs to the Internet of Things?
>
> Could things like International Standard Book Numbers be included, with a
> code point for each book edition?
>
> What about individual copies of a rare book?
>
> What about museum items?
>
> What about paintings and sculptures?
>
> Could this tie up with serial numbers used in GS1-128 Barcodes?
>
> Please note that the 128 in GS1-128 refers to the 128 characters of ASCII,
> not to 128-bits.
>
> I am wondering whether U+FFF8 plus 32 tag characters could be handled
> directly by a GSUB glyph substitution within an OpenType font.
>
> However, with such a large code space, there would need to be a way to
> access glyph information over the internet, maybe use of a one-glyph web
> font for each glyph would be possible in some way.
>
> William Overington
>
> Monday 2 April 2018
>
>
>
>
Received on Mon Apr 02 2018 - 14:04:33 CDT
This archive was generated by hypermail 2.2.0 : Mon Apr 02 2018 - 14:04:33 CDT