From: Brian Wilson (bountonw@gmail.com)
Date: Mon Jul 16 2007 - 08:47:35 CDT
I probably have this all wrong, but aren't there 65,536 possible characters in Unicode?
Why not have a section of 48 characters for generic bases. Encode the 10 characters that John Hudson recommends. All of the generic bases would be in one section of unicode and there would be plenty of room for expansion. That saves us ignorant people from wondering, "now which 'x-like' symbol do I use for Lao again"?
Brian Wilson
-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of James Kass
Sent: Monday, July 16, 2007 6:21 AM
To: Asmus Freytag
Cc: 'Unicode List'
Subject: Re: Generic base characters
Asmus Freytag wrote,
> The problem with using 25CC is that it is *not* the dotted circle that
> is used as a base for combining characters in the standard. While it's
> name is "DOTTED CIRCLE", it was encoded to cover a symbol that differs
> in both size, weight, and details of line style, as well as perhaps
> vertical alignment from the true dotted circle used as a generic base.
Two related issues.
1) Fallback rendering of unexpected isolated combining marks.
2) An author entering desired generic bases plus combining marks
in plain text for illustrative/informative purposes.
Fallback rendering is up to the font engine.
A listing of Unicode characters suitable for use as generic base
characters, such as John Hudson suggested, might, among other
purposes, be used as a guideline for localization of operating systems.
An unexpected isolated combining mark occurs when a font engine
encounters a sequence which it does not support. So, the font
engine needs to support the entire listing in order to avoid treating
such combinations as unexpected. The engine should not insert
undesired fallback display behavior when an author has specifically
encoded a desired form.
For the generic base glyph which resembles a plus sign, would it be
better to have a new, dedicated character, or to choose one from
the similar signs already encoded? (examples: +˖ᐩ⁺₊⊕⊞⧾⨁⨹﬩+)
(Plus signs which already have diacritics [⨢⨣⨤⨥⨦] should probably be
excluded from consideration.)
John indicated that there are possibly fewer than ten attested
shapes used as generic bases. Why not encode them as characters in
their own right? After all, what's another plus sign, more or less?
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Mon Jul 16 2007 - 08:51:14 CDT