How to Add Beams to Notes
Richard Wordingham via Unicode
unicode at unicode.org
Wed May 3 02:49:49 CDT 2017
On Tue, 2 May 2017 05:08:27 +0200
Philippe Verdy via Unicode <unicode at unicode.org> wrote:
> Consider also that the BMP is almost full, the remaining few holes
> are kept for isolated characters that may be added to existing
> scripts, or permanently reserved to avoid clashes with legacy
> softwares using simple code remappings between distinct blocks, or to
> perform simple case conversions (e.g. in Greek) for internal purposes
> (these positions are not interoperable and may clash with future
> versions of the UCS and I18n tools/libraries like ICU)
> You should abstain using any currently unassigned positions in the
> existing Unicode blocks: use PUA if you have nothing else; there are
> plenty of space available, in the BMP (most common usage in fonts
> that need to map additional glyphs) or in the two last planes.
It isn't codepoints that is the constraint; one must consider the
number of glyphs without dedicated one-character codes. For example,
U+1000 MYANMAR LETTER KA needs glyphs for:
1039 1000 (and probably at two different widths)
1039 1000 FE00 (do.)
There are a few CJK ideographs with similar needs:
537F FE00 (= CJK COMPATIBILITY IDEOGRAPH-2F831)
537F FE01 (= CJK COMPATIBILITY IDEOGRAPH-2F832)
537F FE02 (= CJK COMPATIBILITY IDEOGRAPH-2F833)
There's also the Japanese ideographic variation sequence <U+5375
U+E0100>, which should probably have its own glyph even if it's the
same as one of the above.
The Arabic script (and other cursively connected scripts) has similar
expansions, even if one goes for a typewritten style.
Devanagari explodes when one considers just the conjuncts prescribed for
I think it's also necessary to avoid splitting likely grapheme
clusters between fonts. Which of the three fonts will support U+1F3F4
U+E0067 U+E0062 U+E0065 U+E006E U+E0067 U+E007F (English flag) and
which U+261D U+1F3FF (index pointing up: dark skin tone)?
Now, the BMP has headroom provided by the surrogate characters and the
PUA, which will not have mappings, but I'm not sure that it's enough.
That's why I asked the question.
More information about the Unicode