Re: more dingbats in plain text

From: Asmus Freytag ([email protected])
Date: Fri Apr 17 2009 - 14:38:07 CDT

Next message: Doug Ewell: "Re: Handling of Surrogates"

Previous message: Asmus Freytag: "Re: Handling of Surrogates"
In reply to: Johannes Bergerhausen: "Re: more dingbats in plain text"
Next in thread: Doug Ewell: "Re: more dingbats in plain text"
Reply: Doug Ewell: "Re: more dingbats in plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 4/17/2009 8:33 AM, Johannes Bergerhausen wrote:
>
> I would like to say that these symbols (the first version is from
> 1993) are used worldwide on the same line of text mixed with other
> characters like latin.
>
> Another example to attest that there is a need to put international
> symbols like public signage in the UCS.
>
>
Johannes,

if you can show that a symbol is being used inline, that would satisfy
*one* of several criteria that should be met in order to encode it as a
character in Unicode.

There is some disagreement in the character coding community of which
other criteria need to be satisfied to proceed with encoding of any new
symbol.

If the symbol is part of a recognized *notation* then there seems to be
widespread agreement that it should be encoded.

If the symbol is already encoded in another character set, then it
should be encoded *as long* as there's agreement that the other
character set needs to be supported for compatibility.

Beyond that, things are more difficult.

One group of people firmly believes that if a symbol has been used with
special fonts in rich text, that's proof that anyone needing this symbol
already has a means to use it, and there's no need for encoding it. As
you can easily surmise, this position, if taken to an extreme, disallows
the encoding of any such symbols.

Other people disagree - they feel it should be possible to search for
symbols (as well as to use symbols in plain text). If you apply the
second position to its fullest, you might want to encode all symbols
ever used inline.

My position is somewhat in the middle: I think that there are some
symbols (more than currently encoded) that occur rather frequently. I
call them "common" symbols. They usually have a well-defined
appearance, which makes them highly recognizable, but they can often be
used in a variety of contexts and with a variety of meanings, given by
context.

Because of their nature, they are highly versatile and useful - I would
not hesitate to predict that they would end up being more often used
than many of the rare or historic characters among the scripts in
Unicode. That potential for widespread (for symbols) usage makes them
attractive for standardization in my view.

There are many sets of symbols of specialized nature, some extremely
rigidly defined, for example the ISO set of warning signs (occupational
hazards and the like). Despite their precise definitions, such sets (as
a whole) would make poor targets for standardization as characters. The
reason is that, by their nature, most of the symbols in these sets are
very highly specialized, and therefore occur rarely, if at all, in
inline text. However, many such specialized sets contain one or two, or
a few, widely known and used symbols.

To standardize anything represents a cost. For rare characters, such
cost are in poor relation to the benefits - just as Unicode started out
with encoding the widely used scripts first, widely used symbols should
be encoded first - even if that means one has to provide an arbitrary
cutoff that separates the common from the uncommon symbols *within* each
category or set of symbols.

Example: most traffic symbols like DEER CROSSING or SPEED LIMIT 30
should probably not be encoded as characters. The STOP sign or the
European CAUTION sign, however, are examples of common symbols, that
deserve status as characters. You find them as part of texts where they
retain their customary shape, but don't refer to traffic, but are used
in a generalized sense. Hence, they have become _common_ symbols.

Having encoded the _common symbols_ from a set of symbols, it's a
fallacy to think that this then requires to also encode all the other
symbols from that set, no matter how specialized. That's different from
encoding scripts.

The current Japanese-oriented additions (ARIB, Emoji) have added or will
add many such common symbols. We've since learned that the technology
that makes those symbols available for inline messages, is spreading to
outside Japan.

Therefore, what would be most useful in looking to "attest" symbol
characters as you call it, would be to categorize the missing _common_
symbols that relate to European (and other non-Japanese) usage.

It's not sufficient to just point at sets of symbols for that - you also
need to isolate which ones are _common_ symbols in each set, according
to the definition of this concept that I've proposed here.

I keep hoping that someone with the resources, time and interest will
take on that project.

A./

Next message: Doug Ewell: "Re: Handling of Surrogates"
Previous message: Asmus Freytag: "Re: Handling of Surrogates"
In reply to: Johannes Bergerhausen: "Re: more dingbats in plain text"
Next in thread: Doug Ewell: "Re: more dingbats in plain text"
Reply: Doug Ewell: "Re: more dingbats in plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 14:40:32 CDT