From: karl williamson (public@khwilliamson.com)
Date: Fri Aug 07 2009 - 14:50:52 CDT
I forgot to include the public list as a cc to this, which I am now
doing, but perhaps it is better, as I realize that I'm confused about
what reserved means. I thought from NamesList.txt that reserved
characters were unassigned ones that were never going to be assigned
because of some constraint on them, such as being place-holders. Like
the following:
1D51D <reserved>
x (black-letter capital z - 2128)
where the code points around it are assigned, but this one essentially
duplicates 2128, and so is skipped.
But in looking at extracted/DerivedGeneralCategory.txt, it appears that
reserved is any Cn code point that isn't a non-character.
karl williamson wrote:
> Kenneth Whistler wrote:
>> Karl Williamson wrote:
>>
>>> ... I thought I should add some things I've been thinking about to
>>> make sure I understand. Feel free to correct me.
>>>
>>> Each Unicode property is defined on a subset of the Unicode code
>>> points. Many are defined on the complete set, but some are not,
>>> such as Name, as for example, surrogates and private use code points
>>> have no name.
>>
>> Actually Name *is* defined on the complete set. The values for
>> the Name property are strings, and for reserved code points
>> (and some other code point types), the value of the Name property
>> is the null string.
>>
>> Since this has been confusing to a lot of people, the Unicode 5.2
>> text about Unicode character names has been substantially updated
>> to clarify this. See Section 4.8 Name--Normative in the Chapter 4
>> pdf posted for review. (Accessible from the Unicode 5.2 beta
>> page.)
>>
>
> It was helpful looking at the 5.2 draft. But it brought up another
> question. I don't see anywhere in the UCD (except in NamesList.txt) any
> mention of reserved code points. I don't see any way to distinguish
> between these and code points that are otherwise unassigned, and not
> permanently non-characters. Perhaps it is thought that that information
> is not relevant, but the draft mentions "reserved-NNNN" as a possible
> identifying string for such a code point. Again, perhaps it is assumed
> that only in the text of the standard would anyone wish to make this
> distinction.
>
>>> It's unclear to me if in releases before the Unknown property value
>>> was added to the Script property, what the definition was, if any, of
>>> code points that didn't have any other of the Script property values
>>> (and similarly for a number of other catalog properties).
>>
>> The issue of default values is explained now in more detail
>> in Section 4.2.8 Default Values in UAX #44. See the Unicode 5.2
>> proposed update:
>>
>> http://www.unicode.org/reports/tr44/tr44-3.html#Default_Values
>>
>> As far as the default value of the Script property is concerned,
>> before Script=Unknown was introduced, the Scripts.txt file itself
>> defined Script=Common as the default value.
>
> I had overlooked this. But there are other examples in which there at
> one time was no default value given, but now there is, like NaN for
> numeric value. Was the default the null string for earlier releases, or
> was it just undefined?
>
>> [snip]
>
This archive was generated by hypermail 2.1.5 : Fri Aug 07 2009 - 14:54:07 CDT