RE: APL Under-bar Characters

From: <alexweiner_at_alexweiner.com>
Date: Sun, 16 Aug 2015 09:31:25 -0700
Khaled,
Thank you for the link. The normalization methods were already discussed, specifically here:

http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00047.html


Where the problem of "how big" is ä is discussed. The answer being that this is one symbol, because the Unicode Consortium decided that it is also its own standalone character. From the thread:

I'll give you an example. What would you want ⍴,'ä' to be?

Right now, that could return either 1 or 2 depending on whether the ä was using the precomposed character (U+00E4) or the combining mark (U+0061, U+0308). Visually, these are identical, and generally you'd expect them to compare equal.

In Unicode, the comparison of equivalent (but with different characters) strings are done by performing a normalisation step prior to comparison. There are 4 different types of normalisation, with different behaviour.

Now, the ä character has a precomposed form in Unicode, and if you couple that with the NFC normalisation form, you'd get the above _expression_ to return 1.


So I'm not sure why the allowance was made for ä as well as other certain characters,  but not for other things (under-bar characters) that face similar representation issues. 

-------- Original Message --------
Subject: Re: APL Under-bar Characters
From: Khaled Hosny <khaledhosny@eglug.org>
Date: Sun, August 16, 2015 8:17 am
To: alexweiner@alexweiner.com
Cc: unicode@unicode.org

On Sun, Aug 16, 2015 at 07:35:17AM -0700, alexweiner@alexweiner.com wrote:
> Hello Unicode Mailing List,
>
> There is significant discussion about the problems of adding capital letters
> with individual under-bars in this mailing list for GNU APL.
>
> http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00050.html
>
> Pretty much it adds up to the following problem:
>
> The string length functionality would view an 'A' code point combined with an
> '_' code point as an item that has two elements, while something that looks
> like 'A' Should be atomic, and return a length of one.

I think what you need is better “character” counting [1], rather than
new precomposed characters.

Regards,
Khaled

1. http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
Received on Sun Aug 16 2015 - 11:32:50 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 16 2015 - 11:32:50 CDT