Re: Why 17 planes? from Doug Ewell on 2012-11-28 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Wed, 28 Nov 2012 09:06:05 -0700

Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

Date: Wed, 28 Nov 2012 09:04:41 +0100

> Yes I know and I was clear about this that this was not in scope of
> the current standard encoding policy
>
> Which however still does not prevent another upward-compatible
> standard to emerge using another encoding policy (e.g. for encoding
> glyphs or coporate logos, in an Internet-based registry, with
> registrars, but without the long term stability, or with a limited
> stability, with a limited grace period where the new codes would still
> be reserved for a similar use by the same registrant or then someone
> else, or then removed, like for domain names).

No, my point was that trying to make a new standard like this (for
logos, glyphs, images, flags, whatever) "upward-compatible" with Unicode
is approaching the problem from the wrong direction. It makes people
imagine that the new standard is an extension of Unicode, which is
potentially damaging to both the new standard and Unicode.

> And I was clear when demonstrating that the standard PUA ranges are
> also large enough to permit any extension with arbitrary number of PUA
> codepoints.

PUA code points should be used for things that can reasonably be called
"characters." The definition of that does change over time, modestly, as
glyph variants and vendor-specific images make their way into the
standard. And it also makes sense for a closed system to use PUA code
points *internally* for some types of control function unique to that
system.

Using the PUA to extend Unicode substantially beyond what a character
encoding standard is supposed to be, and (especially) expecting others
to adopt that non-character PUA usage, or expecting it to be ipso facto
a step toward formal encoding, is wrongheaded. The right approach is to
develop a new standard and, if needed, integrate Unicode into it, not
try to integrate the new standard into Unicode.

> But what is important is to maintain the current encoding policy (open
> and not restricted by IP rights, and demonstrated as useful and used
> by a significant comminity over a significant period, with some common
> coherence for defining common and stable character properties across
> these usages) as strict as possible, otherwise it will rapidly explode
> and the bet of the 17 planes will rapidly be exhausted.

Agreed.

> Note that I ma not suggesting any "Unicode/10646-compatible framework"
> (or even standard) and I have no project to define one myself. But
> there's absolutely nothing that Unicode/ISO/IEC/10646 can forbid for
> such thing to appear.

But it would be a mistake on the part of the person or group proposing
it, and would muddy the question of "what Unicode is," already muddy in
some people's minds.

> The only point of resistance will be when those other frameworks or
> standards will want to be granted an allocation for a block of
> characters in the current standard for supporting their extension in a
> standard codepoints range, instead of in an existing PUA range like
> they can do today as they want without asking your permission. Then
> the current policy may be challenged according to its existing rules :
> demonstrated usage, and an agreement about common properties. It will
> then be difficult to refuse this non-PUA grant to them.

That's precisely why encoded elements in those other frameworks or
standards should not be thought of as Unicode characters. Define the new
standard independently of Unicode, and that point of resistance goes
away.

> Other challenges will be:
> - (1) to limit the current explosion of encoding requests...

If the requested items are characters, proposed for Unicode and 10646,
then UTC and WG2 get to decide whether to encode them or not. If they
are not characters, then they will be defined, proposed, and encoded
according to the new standard. The maintainers of that standard would be
expected to set their own rules for what does and doesn't get encoded.

> - (2) and how to limit the proliferation of encodings in CJK blocks...

It seems the IRG has an adequate handle on this.

> And even without changing anything to existing UTF's, strings of
> characters taken from a smaller block in the supplementary planes may
> be used to implement these large extensions (after all this is already
> what is happening with hieroglyphs).

I don't understand what this means.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Received on Wed Nov 28 2012 - 10:10:21 CST

This archive was generated by hypermail 2.2.0 : Wed Nov 28 2012 - 10:10:26 CST