L2/06-325 - Pre-Preliminary Motions of the UTC 109 / L2 206 Joint Meeting

L2/06-325

Pre-Preliminary Motions of the UTC 109 / L2 206 Joint Meeting
San Jose, CA -- November 7 - 10, 2006
Hosted by Adobe
UTC #109 Agenda
November 29, 2006

[109-C1] Consensus: Accept the 8 Telugu fractions and signs at U+0C78 - U+0C7F for encoding in a future version of the standard.

[109-C2] Consensus: Acccept U+0F6B Tibetan letter KKA and U+0F6C Tibetan letter RRA for encoding in a future version of the standard.

[109-C3] Consensus: Accept the 4 draught characters, 100 domino characters, and 44 Mahjong game symbol characters with new block Mahjong Tiles 1F000 - 1F02F and a new block Domino Tiles 1F030 1F09F for encoding in a future version of the standard. (Further details in L2/06-351)

[109-C4] Consensus: Change the block name for Old Church Slavonic characters 2DE0 2DFF to Cyrillic Extended-A.

[109-C5] Consensus: Accept the 13 Latin Insular letters in ranges U+A779..U+A77F and U+A782..U+A787 for encoding in a future version of the standard. L2/06-266

[109-C6] Consensus: Accept the 16 letters for Karen and Kayah for encoding at U+1065..U+1074 in a future version of the standard, as documented in L2/06-303.

[109-C7] Consensus: Accept 23 Myanmar characters at U+1022, plus the range U+1075..U+108A for Shan, for encoding in a future version of the standard, as documented in L2/06-304.

[109-C8] Consensus: Keep data files related to Unicode Technical Standards in versioned directories separated for each technical standard underneath the "Public" directory on the Unicode website, allowing for an additional level for sub versions as needed.

[109-C9] Consensus: Unicode technical standards with data files will be revised at every repertoire change, but will only be revised for UCD updates when necessary.

[109-C10] Consensus: Put up new UDHR page and directory on unicode.org with explanation and pointer to existing data. Unicode is interested in hosting this data.

[109-C11] Consensus: Produce a proposed update for UAX #15 that defines a "normalization process for stable strings" (NPSS). Table 11 is not part of the process, but the use of Table 11 with some pre-4.1 implementations is defined as part of a "restricted NPSS".

[109-C12] Consensus: Post a public review issue proposing to give to give character U+00B7 MIDDLE DOT the Other_ID_Continue property. (See L2/06-378 Scarborough).

[109-C13] Consensus: Make derived numeric values preserve canonical equivalence by adding property values where necessary.

[109-C14] Consensus: Authorize proposed updates of UTR #36 and UTS #39 incorporating editorial changes including an updated pointer to LDAP and draft data update.

[109-C15] Consensus: Change the bidi category of characters U+0600, U+0601, U+0602, U+0603, and U+06DD from "AL" to "AN" in the next version of the standard.

[109-C16] Consensus: Give U+FA30 - U+FA6A the ideographic property, and fix the wordbreak property.

[109-C17] Consensus: Add U+6F06 and U+9621 to our list of accounting numbers in the Unihan database.

[109-C18] Consensus: Update the UCD to add numeric values to the 8 Compatibility Ideographs in L2/06-309R (to match their canonical characters).

[109-C19] Consensus: Issue a PRI regarding use of ZWJ/ZWNJ in identifiers based on L2/06-353.

[109-C20] Consensus: Include the following bullet points in a liaison document to the IETF:

The approach in (L2/06-341) draft-alvestrand-idna-bidi-00.txt identifies the problem correctly; there are some issues with the proposed solution. We recommend as an alternative ignoring trailing combining marks when performing the exclusion.
A field should not start with a combining mark.
UTC doesn't see a problem with decomposing U+05F0..U+05F2 in stringprep.
UTC is in favor of the approach taken in L2/06-340 section 2.1.4 in allowing Unicode 5.0 and future characters to be in IDNs.
UTC would like to see the protocol allow for the use of Unicode characters for future versions with as quick a turn-around as possible. This can be done by basing the protocol on backwards-compatible Unicode properties, such as the definition of Unicode identifiers. UTC reaffirms willingness to add and maintain new properties for guaranteeing stability.
UTC disagrees with the approach in section 4 of the document. To break backwards compatibility with the protocol there must be a compelling reason. Any such changes must be shown to solve a large majority of well-documented problems.
Section 4 of the document is a major break with in backwards compatibility. A compelling case must be made, and this case has not been made.
UTC disagrees with the statement in section 2.1.3. The problem is that IDNA 2003 used an ad-hoc approach and should have been based on Unicode character properties.

[end of consensus 109-C20]

[109-C21] Consensus: Include the following bullet point in a liaison document to the IETF:

Call to the attention of the authors of L2/06-342 draft-faltstrom-idnabis-tables-00.txt that the Unicode Consortium already recommends a restriction of codepoints for the purpose of identifiers in UAX #31, plus further additions and restrictions in UTS #39 under [IMOD] and discussed in Section 3.1.

[end of consensus 109-C21]

[109-C22] Consensus: Accept the recommendations of the scripts subcommittee which met on Tuesday November 7, 2006, resulting in the following UTC consensus items 109-C23 through 109-C33 and their associated action items.

[109-C23] Consensus: UTC approves names and codepoints for four Orthographic and Modifier Characters as given in L2/06-259R for encoding in a future version of the standard, at codepoints U+A789 .. U+A78C.

[109-C24] Consensus: UTC approves for encoding BOPOMOFO LETTER IH with codepoint U+312D as given in L2/06-338, in a future version of the standard.

[109-C25] Consensus: UTC accepts the U+214F SYMBOL FOR SAMARITAN SOURCE for encoding in a future version of the standard.

[109-C26] Consensus: Post a public review issue for the proposal L2/06-268, encoding of an "external link sign".

[109-C27] Consensus: UTC accepts the change in names of 11 recently accepted Arabic characters in FPDAM 3, as described in L2/06-328, to wit:

U+0773 ARABIC LETTER ALEF WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE
U+0774 ARABIC LETTER ALEF WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE
U+0775 ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE
U+0776 ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE
U+0777 ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT FOUR BELOW
U+0778 ARABIC LETTER WAW WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE
U+0779 ARABIC LETTER WAW WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE
U+077A ARABIC LETTER YEH BARREE WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE
U+077B ARABIC LETTER YEH BARREE WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE
U+077C ARABIC LETTER HAH WITH EXTENDED ARABIC-INDIC DIGIT FOUR BELOW
U+077D ARABIC LETTER SEEN WITH EXTENDED ARABIC-INDIC DIGIT FOUR ABOVE

[109-C28] Consensus: UTC accepts the encoding of 8 Arabic characters for Persian and Azerbaijani as described in document L2/06-345 with names and code positions as given but changing the name of U+0616 to "ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH".

0616 ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
063B ARABIC LETTER SEEN WITH INVERTED V
063C ARABIC LETTER KAF WITH TWO DOTS ABOVE
063D ARABIC LETTER KEHEH WITH TWO DOTS ABOVE
063E ARABIC LETTER KEHEH WITH THREE DOTS BELOW
063F ARABIC LETTER FARSI YEH WITH INVERTED V
077E ARABIC LETTER FARSI YEH WITH TWO DOTS ABOVE
077F ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE

[109-C29] Consensus: UTC approves the encoding of four Qur'anic Arabic characters with names and codepoints as in L2/06-358, U+0617 - U+061A. They should be classified as confusables for domain names. The four characters are:

0617 ARABIC SMALL HIGH ZAIN
0618 ARABIC SMALL FATHA
0619 ARABIC SMALL DAMMA
061A ARABIC SMALL KASRA

[109-C30] Consensus: UTC accepts solution 2a in document L2/06-357 (i.e., to fix the glyphs and annotations that the character is a misnomer for "beautiful omega"). The two broad omegas are covered in a separate proposal.

[109-C31] Consensus: UTC accepts solution 3a of document L2/06-357 without the addition of cyrillic capital letter OU (see page 5) for Cyrillic "uk". This amounts to a recommendation to add two monograph Uk characters. (See also further proposal below.)

[109-C32] Consensus: UTC agrees in principle to encode the first 2 columns of characters in L2/06-359 without change. The author should resubmit this proposal with the deletion of characters xx20 and xx22, and a new code position for xx25 (in an area with other non-spacing marks). Also remove xx27 and xx28 from the proposal before resubmitting.

[109-C33] Consensus: UTC accepts for encoding 11 characters from L2/06-269 (with some name changes) at the following codepoints:

A7FB LATIN EPIGRAPHIC LETTER REVERSED F
A7FC LATIN EPIGRAPHIC LETTER REVERSED P
A7FD LATIN EPIGRAPHIC LETTER INVERTED M
A7FE LATIN EPIGRAPHIC LETTER I LONGA
A7FF LATIN EPIGRAPHIC LETTER ARCHAIC M
2185 ROMAN NUMERAL SIX LATE FORM
2186 ROMAN NUMERAL FIFTY EARLY FORM
2187 ROMAN NUMERAL FIFTY THOUSAND
2188 ROMAN NUMERAL ONE HUNDRED THOUSAND
1019B ROMAN CENTURIAL SIGN
2E19 PALM BRANCH