On 3/22/2013 4:16 AM, Philippe Verdy wrote:
> 2013/3/22 Asmus Freytag <asmusf_at_ix.netcom.com>:
>> The number of conventions that can be applicable to certain punctuation
>> characters is truly staggering, and it seems unlikely that Unicode is the
>> right place to
>> a) discover all of them or
>> b) standardize an expression for them.
> My intent is certainly not to discover and encode all of them. But
> existing characters are well known for having very common distinct
> semantics which merit separate encodings.
This claim would have to be scrutinized, and, to be accepted, would
require very detailed evidence. Also, on what principles would you base
the requirement to make a distinction in encoding?
> And this includes notably their use as numeric grouping separators or decimal separators.
Well, the standard currently rules that such use does not warrant
separate encoding - and the standard has been consistent about that for
the entire 20+ years of its existence.
Further, all other character encoding standards have encoded these
characters as unified with ordinary punctuation. This is very different
from the ANO TELEIA discussion, where an argument could be made that
*before* Unicode, the character occurred only in *specific* character
sets - and that was a distinction that was lost when these character
sets were mapped to Unicode.
No such argument exists for either middle dot or raised decimal point
(except insofar as you could possibly claim that raised decimal point
had never been encoded properly before, but then you'd have to show some
evidence for that position).
>
> Such common semantic modifiers would be eaiser to support than
> encoding many new special variants of characters (that won't even be
> rendered by most applications, and thus won't be used).
That might be the case - except that they would introduce a number of
problems. Any "modifier" that has no appearance of its own can get
separated from the base character during editing.
The huge base of installed software is not prepared to handle an
entirely different *kind* of character code, whereas support for simple
character additions is something that will eventually percolate through
most systems - that fact makes disunifications a much more
straightforward process.
>
> Some examples : the invisible multiplication sign, the invisible
> function sign,
Nah, these are not modifiers. They stand on their own. Their
"invisibility" is not ideal, but not any worse than "word joiner" or
"zwsp". All of these characters are separators - with the difference
that the nature of the separator was determined to be crucial enough to
encode explicitly. (And of course, reasonable people can disagree on
each case).
Note that Unicode cloned several characters based on their word-break
(or non-break) behavior, which is not a novel idea (earlier character
encodings did the same with "no break space"). Already at that stage the
train of having a "word break attribute character" (what you call a
modifier) had left the station.
The only way to handle these issues, for better or for worse, is by
disunification (wher that can be justified in exceptional circumstances).
> and even the Latin/Greek mathematical letter-symbols
> which were only encoded for encoding style differences which have
> occasional but rare semantic differences. For me, adding those
> variants was really pseudo-coding, breaking the fundamental encoding
> model, and complicatin the task for font creators, renderer designers,
> and increasing a lot the size and complexity of collation tables.
>
> Many of these character variants could have been expressed as a base
> character and some modifier (whose distinct rendering was only
> optional), allowing a much easier integration and better use. Because
> of that the UCD is full of many added variants that re alsmost never
> used and we have to leave with encoded texts that persist in using
> ambguous characters for the most common possible distinctions.
No, for the math alphabetics you would have had to have a modifier that
was *not* optional, breaking the variation selector model.
There was certainly discussion of a "combining bold" or "combining
italic" at the time.
One of the major reasons this was rejected included the desire to
prevent the creation of such "operators" that could be applied to
*every* character in the standard.
And, of course, the desire to allow ordinary software to do the right
thing in displaying these - the whole infrastructure to handle such
modifiers would have been lacking.
Further, when you use and italic "a" in math, you do not need most (or
all) software to be aware that this relates to an ordinary "a" in any
way. It doesn't, really, except in text-to-speech conversion or similar,
highly specialized tasks. So, unlike variation selectors, there would
have been no benefit in using a modifier.
A./
Received on Fri Mar 22 2013 - 11:27:04 CDT
This archive was generated by hypermail 2.2.0 : Fri Mar 22 2013 - 11:27:10 CDT