Longest Names (was: Re: Unicode trivia)

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue May 09 2000 - 13:58:33 EDT


John asked:

>
> On Sat, 6 May 2000 08:09:49 -0800 (GMT-0800), Doug Ewell wrote:
> > Recently while writing a C program that reads UnicodeData.txt, I needed
> > to determine the longest character name. The winner (83 characters):
> >
> > U+FBF9 ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF
> > MAKSURA ISOLATED FORM
>
> Out of curiosity, and perhaps of more importance than the current
> longest name, is there a specified length limit which names are
> guaranteed not to exceed?

There are no guarantees, but no one in the UTC or WG2 is competing to
create a longer name. There are no specified length limits in the naming
rules that WG2 follows, but the committee has watchdogs who try to keep
the names shorter, when possible. Furthermore, since the longest existing
names are compatibility Arabic ligatures, and since WG2 has vowed not to
encode any more Arabic ligatures, we are unlikely to see longer names.

My own rule of thumb for processing UnicodeData.txt is to use 128 bytes
for transient buffers for names -- which gives me a 99.999% confidence
feeling that future versions of the data file will never break it.
But for persistent storage I use variable length arrays anyway, since the
average name length is so much shorter than the longest name length.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT