On 2/23/2012 2:44 PM, António Martins-Tuválkin wrote:
> On 2012/2/23 Matt Ma<matt.ma.umail_at_gmail.com> wrote:
>
>> It is defined as
>> "33D7;SQUARE PH;So;0;L;<square> 0050 0048;;;;N;SQUARED PH;;;;"
>> in UnicodeData.txt, but it is shown as "pH" in code chart. Should it be
>> "0070 0048" or "PH"?
> It should certainly be "pH", i.e., "<square>0070 0048</square>",
> because that's the peculiar casing in widespread (universal, really)
> use for this basic Chemistry concept (AFAIK it means "power of
> Hidrogen"). See< http://en.wikipedia.org/wiki/pH#History>.
>
> While there's no surprise at "PH" Unicode names being all caps, I’m
> surprised that the decomposition mapping is wrongly set to 0050 0048
> instead of to 0070 0048.
The early fonts and code tables showed this in all caps.
Unfortunately, mappings are frozen - including mistakes.
One of the many reasons not to use NF"K"D or NF"K"C for transforming
data - these transformations should be limited to dealing with
identifiers, where practically all of the problematic characters are
already disallowed.
If your intent is to sort or search a document using "fuzzy"
equivalences, then you are not required to limit yourself to the NF"K"
C/D transformations in any way, because you would not be claiming to be
"normalizing" the text in the sense of a Unicode Normalization Form.
A./
Received on Thu Feb 23 2012 - 19:28:36 CST
This archive was generated by hypermail 2.2.0 : Thu Feb 23 2012 - 19:28:38 CST