L2/11-406
Re: | Script Extensions as a Unicode Property |
To: | UTC |
From: | Mark Davis |
Date: | 2011-11-25 |
The script extensions just exist as a data file, and not a formal property. That makes them clumsier to cite and use. See, for example, the feedback from Karl Williamson on UTS #18, April 30.
We already have many multivalued Unicode character properties, in Unihan, so there is no formal difficulty in adding Script_Extensions as a provisional property. Here is a proposed description.
The ScriptExtension (scx) property has as values a set of one or more Script property values. The ScriptExtension value for a given character C is defined based on the UCD data files as follows:
When used in an expression to denote a set of characters, such as in the regular expression \p{scx=Arab}, the value of that expression is the set of all code points whose ScriptExtension value contains the given script. Thus:
The PropertyAliases.txt line would be:
SCX ; Script_Extensions
We would also add the following to the data file:
# @missing: 0000..10FFFF; Script_Extensions; <script>