From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Apr 24 2008 - 19:41:25 CDT
On 4/24/2008 2:53 PM, André Szabolcs Szelp wrote:
> Oh,
> it was not a real concern, I don't work with Armenian data, nor do I
> read it, I just came across that piece of information and was
> wondering why. I thought, best place to ask was here. Who else would
> know, if not the people here? :-)
>
> Also I was puzzled, because I thought there was a guidline to create
> one-to-one mappings of pre-existing (including national) standards
>
...that aim was always limited to a certain more-or-less well-defined
set. Esp. for lesser-used
standards, an attempt was made to not encode any characters that were
questionable, whereas
for really widely used standards even things that appear to be outright
mistakes had a shot to be
encoded. The thinking behind that was to get the maximum coverage of
existing *data* with
the smallest number of problematic characters added to Unicode.
Over the years, experience and additional information that became
available has lead to a gradual
adaptation to better reach the underlying goal. To a very limited
extent, even standards created
after Unicode was initially published, have been covered. That's a
tricky thing to do, because
one the one hand, you don't want to exclude potentially large user
communities that got used to
characters in their standard, while at the same time, you don't want to
make this a sure ticket to
force characters into Unicode by going outside the process.
> (that's why Dutch ij is included as a single character, while its
> encoding as two characters is recommended, that's why the alphabetical
> presentation forms of fi, fl etc. are included, ...)
>
Precisely.
> So this does not hold?
>
>
If the character doesn't violate a principle in the standard, there's no
reason why it couldn't be
encoded; however, if its presence in the standard is not correlated with
it showing up in actual
documents (for example, because of the way systems and fonts have
implemented the standard)
then there's perhaps no need to encode the character based on its
presence in a code chart.
On the other hand, perhaps the standard did base the design on a real
character. If sufficient
information can be assembled to define that character, it would open up
an avenue to encode
it, which would be independent of the character.
>> > (I indeed did not find the character in the Armenian block, but it
>> > could hide somewhere among the dingbats (but if so without an
>> > annotation saying "eternity sign")).
>>
>
> There isn't an exact match, but something in the U+274x range can
> serve as a good approximation.
>
> Leo
>
>
>
If the standard is in use and if there's an indication that people are
using this particular character, then the last thing we would want to do
is to map it "approximately", especially not to something in the 274X
range. That range, by design, was supposed to have somewhat lesser
variability in glyph design than other blocks. But even without the
special nature of this range, the damage of having mapped characters
"approximately" (esp. ASCII characters) is still with us today.
If this thing is real and someone can prove it, code it, if not, wait
for users of the standard to speak up that they need it for compatibility.
A./
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Thu Apr 24 2008 - 19:44:38 CDT