Re: Is U+0140 (l with middle dot) ever used?

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 12 2002 - 15:06:16 EDT


Keld responded:

> On Fri, Aug 09, 2002 at 11:44:40PM +0100, Anto'nio Martins-Tuva'lkin wrote:
> >
> > Hm. But middle dot is not also a letter symbol. It's also used as a
> > bullet, a tab filling, even a box-drawing char. Shouldn't Unicode
> > provide a way to separate this duality?
>
> · has traditionally been used eg in word processors to visually display
> a blank character. But it was originally intended in ISO 8859-1 and
> other places for the Catalan language, which uses it in words such
> ac paral·lel.

However, one cannot ignore the rest of the manifest history of
this character. It also has long occurred in Code Page 437 and myriad
other IBM and Microsoft Code Pages (IBM GCGID SD630000) with a long
history of ambiguous usage as punctuation and many other things.

> I think · is now listed in Unicode as a separator, and not
> as alphabetical.

It is actually listed with General Category Po (Punctuation, Other),
and not as one of the separator classes.

But it also has the diacritic property and the extender property,
which most punctuation characters do not.

Property-based implementations can take advantage of other properties
of U+00B7 to distinguish it from most punctuation.

> I think that is an error. How can we correct it?

Changing it out of the General Category Po would disturb what by
now is already a long legacy practice for many implementations. It
would cause way more problems than the putative problem it is
supposed to fix for Catalan. (This despite the fact that unlike the
Catalan usage, which actually is more reminiscent of the delimiter
usage of a middle dot, as in dictionary syl·la·bi·fi·ca·tion, there
are actually quite a number of technically-based orthographies,
in the Americas, at least, which use a middle dot simply as a long
vowel diacritic.)

Word delimitation depends on more than merely the General Category
value, anyway, so appropriate word boundary determination can be
developed for Catalan and other languages regardless of the
General Category Po value for U+00B7. (See DUTR #29 on this.)

And for identifiers, it is up to particular implementations to
determine whether inclusion or exclusion of U+00B7 makes sense
for their identifier syntax. What is gained for Catalan by
including U+00B7 in identifiers may be offset by confusion that
can set in against the usage of U+00B7 as a delimiter punctuation,
or as a representation of middle dot operators in mathematical
expressions.

--Ken

>
> Kind regards
> Keld
>
>



This archive was generated by hypermail 2.1.2 : Mon Aug 12 2002 - 13:21:55 EDT