From: Michael D'Errico (mike-list@pobox.com)
Date: Tue Jan 13 2009 - 17:20:29 CST
In a discussion regarding the possibility of assigning 26 code points to
be used in pairs to encode country flags, such as <FLAG C, FLAG A> to
specify the Canadian (CA) flag, Michael Everson wrote:
> Not even MILDLY tempting as an encoding model.
It took a while to figure out how we could be in such disagreement, but
I think I finally did. While I think of Unicode more in terms of an
information communication protocol, Michael probably thinks of it more
in terms of information display/rendering. I base this assumption on
the fact that he is heavily involved in font development.
The thing I like is that it only requires 26 code point assignments, yet
has the ability to represent the equivalent XML: <flag>CA</flag> in
plain text. The code points themselves carry with them the "flag-ness",
so this information is available even to a plain-text process. If two
code points were not enough to specify every country or area, as was
suggested for CYM, then three or more code points could be used to
accommodate them (with no additional assignments).
The alternative that Michael prefers is where all pairs of letters are
encoded: FLAG AA, FLAG AB, ... FLAG ZZ for a total of 676 assigned code
points. The advantage of this is ease of rendering since you can simply
look up the glyph given the code point. There is a cost associated with
it, though, in that you waste an extra 650 code points to provide the
same amount of information. Given Whistler's Conjecture, this can be
rationalized away, though it should be a decision made knowing that it
is a rendering optimization. The 676 code points are equivalent to the
following XML: <flag-aa />, <flag-ab />, etc. so flag-ness is also
conveyed in plain text. There is a problem in that it is limited to just
2-letter codes, so if more were needed, a different solution would be
required. I doubt anyone would suggest assigning all 3-letter combina-
tions (26^3 = 17,576).
For completeness, I should also mention Philippe's suggestion to use
HTML such as: <img src="flag-CA.svg" /> There are numerous problems
with this approach: first, you need an HTML parser to even determine
that something special needs to be done to display the flag; second,
all the HTML parser can determine is that an image is embedded (could
be of anything); third, the only possible way of determining that a
flag is in the image is to parse the URL/filename and hope that it
follows some convention to tell you which country's flag it is. So
clearly this is not a reliable way to represent a country flag in
HTML, much less plain text.
In summary, I am OK with the rendering optimization Michael advocates,
though it is a special case since it is limited to 2-letter country
codes. In the future, if a similar encoding challenge arises that needs
more letters in combination, that approach would not be acceptable.
Mike
This archive was generated by hypermail 2.1.5 : Tue Jan 13 2009 - 17:24:12 CST