L2/13-236
Re: Proposal for change to UTS
#46 Data
Date: 2013 Dec 12
From: Mark Davis
Proposal:
Consider adding an informative
data field for currently-invalid IDNA2008 characters:
XV8 - is not valid in
IDNA2008 for the corresponding version of Unicode, but was valid
in some previous version of IDNA2008.
Background:
3 IDNA2008 status: NV8.
Only present if the status is valid but the character is
excluded by IDNA2008 from all domain names for all versions of
Unicode. This is not a normative field.
And according to that
definition, it is correct that there is no value on:
19DA ; valid
# 5.2 NEW TAI LUE THAM DIGIT ONE
That is because that
character is valid in at least one version of IDNA2008. However,
that field is easily misunderstood to mean "invalid in the
current version", as in Michel's error report (on unicore):
19DA
; valid
# 5.2 NEW TAI LUE THAM DIGIT ONE
1.3.
U+19DA NEW TAI LUE THAM DIGIT ONE
The GeneralCategory for this character changes from Nd to No. This
implies that the derived property value changes from PVALID to DISALLOWED
Accordingly, the entry for 19DA in IdnaMappingTable.txt should have either ‘disallowed’ or some migration functionality (‘deviation’ ?) so that the document could be used to create a IDNA2008 6.3 compatible process.
(I found that gem when comparing the IANA IDNA2008 table for 6.3 and Unicode equivalent. That was the only difference I found). Just to show that GC changes are not w/o consequences.
Because IDNA2008 does not
guarantee the stability of valid characters, if people don't
read the documentation carefully, there are two possible,
reasonable meanings for NV8.
Where UV is the corresponding version of Unicode:
A. A character is valid in
UTS 46 (transitional) but is not valid according to IDNA2008 in
UV,
OR
B. A character is valid in
UTS 46 (transitional) but has never been valid according to
IDNA2008 for any version of Unicode up to and including
UV.
The data we have for NF8 is
for B, not A.
- B is
the best data to use if your implementation needs to
guarantee stability for IDNA2008, while
- A is
the best data to use if your implementation wants to confirm
precisely to IDNA2008 for UV.
There is one possible hiccough
with providing A. We do not have a guarantee that the IETF will not
retroactively change the characters valid in IDNA2008 for a specific
version of Unicode. (If they want to grandfather a character in,
they have to proactively propose a change to the spec, which takes
some time.) But this is also an issue for B.
I suggest for that we document
that in such as cases we will issue a dot-dot release, like Version
6.3.1 with modified XV8, NV8 field values.