Re: String name and Character Name

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Apr 28 2005 - 11:44:52 CST

Next message: Jukka K. Korpela: "Re: Code Point -- What is the integer?"

Previous message: Marion Gunn: "Re: Code Point -- What is the integer?"
In reply to: Hans Aberg: "Re: String name and Character Name"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: String name and Character Name"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: String name and Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Thu, 28 Apr 2005, Hans Aberg wrote:

> Glancing briefly at the many local names of the "@" symbol, it
> suggests that Unicode should supply such localized descriptions.

I think your message, which I quote in part only, nicely summarizes the
idea that many people have been thinking about - with the provision that
"Unicode" in this context means the Consortium rather than the Standard.
Or, more appropriately, the CLDR work being carried out as coordinated by
the Consortium.

Defining localized names for characters (or other things) is primarily a
business of a language community, so it should take place in such
communities.

Mentioning the "@" character opens a can of worms, though. It's one of the
characters, along with "~" for example, that have a wide range of names in
many languages, sometimes even heavily debated. Sometimes we can even
isolate cultural environments (subcultures) that favor one or another word
for a character. Although it would be possible, within the general idea of
CLDR, to define locales corresponding to such environments, this is
probably not a feasible idea.

Normally each character has at most one name in a particular language, and
in many contexts _a_ name is needed. For example, when a speech
synthesizer cannot deal with a character in any better way, it should
probably say its name, and we don't want it to speak a dozen aliases.
In a character selection menu, on the other hand, multiple names can be
useful if they help the user identify the character.

Thus, the best format for definitions of localized names for characters
would probably be an ordered list of names, with aliases welcomed but
without names that might be seriously misleading (even if they are in
use). An application could then use the first name, all the names, or the
first n names for some value of n.

There's perhaps a more urgent localization need: names of Unicode blocks,
and maybe names of character collections as well. Such names do already
appear in localized software such as Character Map, and many of the names
are seriously misleading - in addition to being different in different
applications in an unnecessary way. There's a much smaller amount of
blocks than characters, so this would be just hard work, not huge work.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Next message: Jukka K. Korpela: "Re: Code Point -- What is the integer?"
Previous message: Marion Gunn: "Re: Code Point -- What is the integer?"
In reply to: Hans Aberg: "Re: String name and Character Name"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: String name and Character Name"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: String name and Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 11:45:48 CST