Re: "Universal Character Set"

From: Asmus Freytag ([email protected])
Date: Sat Feb 17 2007 - 15:54:19 CST

Next message: Jon Hanna: "Re: "Universal Character Set""

Previous message: Jukka K. Korpela: "Re: "Universal Character Set""
In reply to: Don Osborn: ""Universal Character Set""
Next in thread: Mark Davis: "Re: "Universal Character Set""
Reply: Mark Davis: "Re: "Universal Character Set""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2/17/2007 9:58 AM, Don Osborn wrote:
>
> Does anyone currently use the term “Universal Character Set” (UCS) to
> refer to Unicode/ISO-10646? I guess it is technically correct, but I
> rarely see it. It seems that folks generally use “Unicode” as the
> catch-all term, or maybe I’m missing a wider use of UCS?
>
I believe your observation about "Unicode" being the common label are to
the point. A bit of research is illuminating and might explain some of
the reasons why the term has caught on.

There are about 33 million pages indexed on Google that can be retrieved
by a search for "Unicode" and about 111,000 by a search for "Universal
character set". If you subtract all pages that mention 10646 or Unicode
or UCS that number drops to 1/10th fir the altter. If you similarly
subtract the other terms from the search for Unicode, there's hardly a
reduction in number.

What that means is that "universal character set" is probably most often
used as a descriptor, as in "Unicode is a universal character set", and
not as a label. The common label is clearly "Unicode". That's not
surprising, because Unicode as a label has the advantage of being
shorter and clearly referring to a specific character set.

In the case of UCS as a label, you run into the problem that the letters
UCS are not unique. Google will pull up the Union of Concerned
Scientists, UCS Inc., University College School and a number of others
on the first screen (and also helpfully suggest that you really meant
USC). Trading non-distinctiveness for brevity is apparently not a clear
win - and the use of UCS (in all meanings) is barely 1/6th of the one
for Unicode. If you search for UCS together with 10646 or Unicode to
sift out when UCS might have been used in the context of character sets,
you find only about 800K inks, which only emphasizes the issue with the
multiple meanings of UCS.

10646 by itself gives about 4.5 million hits, of which fully 1/3 don't
mention ISO, but are in reference to part numbers or are otherwise false
positives--based on that you can conclude that 10646 is used as a
designator of the character set about 1/10th as often as Unicode.

There are instances where referring to Unicode is the only correct
choice. For example, when referring to Unicode Normalization Forms,
Unicode Bidi Algorithm, Unicode Line Breaking, and the myriad other
specifications that have been developed or are being developed around
the character set and collection of character properties by the Unicode
Consortium.

A./

Next message: Jon Hanna: "Re: "Universal Character Set""
Previous message: Jukka K. Korpela: "Re: "Universal Character Set""
In reply to: Don Osborn: ""Universal Character Set""
Next in thread: Mark Davis: "Re: "Universal Character Set""
Reply: Mark Davis: "Re: "Universal Character Set""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 17 2007 - 15:57:07 CST