This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Wed Feb 25 16:37:12 CST 2015
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Indic Syllabic Categories
I've reviewed the application of the revised categories as set forth in L2/14-126 (http://www.unicode.org/L2/L2014/14126r-indic-properties.pdf) as applied to the Thai, Lao and Tai Tham scripts, and noted a few other characters, and come up with the following proposed changes of syllabic category. I have also taken into account the proposals of Roozbeh Pournader of 24 February 2015 related to work on the Universal Shaping Engine. I've come up with 3 new characters of category Bindu: 0303 ;Bindu # Mn COMBINING TILDE 0310 ; Bindu # Mn COMBINING CANDRABINDU 1A74 ; Bindu # Mn TAI THAM SIGN MAI KANG (currently Vowel_Dependent) Note that both U+0ECD LAO NIGGAHITA and U+1A74 function both as Bindu and as Vowel_Dependent. U+0303 is used in Patani Malay in the Thai script - see UTC document L2/10-451. U+0310 is used for Sanskrit in Tamil script, according to Indic list email 'Re: Tamil Punctuation', 27/7/12 9:24 +0530 from Shriramana Sharma. I've found 4 new characters of category Visarga: 0E30 ; Visarga # Lo THAI CHARACTER SARA A 0EB0 ; Visarga # Lo LAO VOWEL SIGN A 1A61 ; Visarga # Mc TAI THAM VOWEL SIGN A 19B0 ; Visarga # Mc (to be Lo) NEW TAI LUE VOWEL SIGN VOWEL SHORTENER Note that the tone (or voice modulation) character U+1038 MYANMAR SIGN VISARGA is currently classified as Visarga. U+0E30 is used as visarga in Sanskrit, e.g. in the Royal Institute Dictionary. The typical sound of the four visargas above is /ʔ/ rather than /h/, and, through a feature of Tai (SW Tai?) phonology, they all have the additional function of shortening a vowel. As a vowel shortener, U+1A61 and U+19B0 may follow a final consonant. These 4 characters are currently classified as Vowel_Dependent. Except for the Lao script, that usage can easily be interpreted as a modification of the implicit vowel. Modern Lao does not acknowledge the existence of an implicit vowel, so that interpretation may be harder to accept. (Vowel_Dependent U+0EB1 LAO VOWEL SIGN MAI KAN is also a vowel shortener; in the 19th century it was denied that Vowel_Dependent U+0E31 THAI CHARACTER MAI HAN-AKAT was a vowel in Thai.) U+1A61 occasionally has the sound /k/, especially when used in conjunction with U+1A62 TAI THAM VOWEL SIGN MAI SAT. I think we should regard this as just one of the uses of visarga. I've found 3 new nuktas, at least, so long as the application of nukta is not restricted to *foreign* consonants. 0331 ; Nukta # Mn COMBINING MACRON BELOW 0359 ; Nukta # Mn COMBINING ASTERISK BELOW 1A7F ; Nukta # Mn TAI THAM COMBINING CRYPTOGRAMMIC DOT U+0331 is used in Patani Malay in the Thai script - see L2/10-451 and the consonant chart on p16 of http://mlenetwork.org/sites/default/files/Patani%20Malay%20Presentation%20-%20Part%202.pdf. U+0331 and U+0359 have been used in English-Thai dictionaries to represent English sounds, very much a nukta role. They were previously classified as 'Other', though there is a proposal to make U+1A7F 'Syllable_Modifier'. U+0EC8 LAO TONE MAI EK functions as Nukta in Khmu as well as performing its principal rôle of Tone_Mark in Lao. U+0E3A THAI CHARACTER PHINTHU is used both as Nukta and as Pure_Killer; the latter is its traditional rôle. I've found 4 new pure killers, all currently classified as 'Other', though there is a proposal to classify U+0E4C (along with U+17CD) as 'Consonant_Killer'. They are: 0E4C ;Pure_Killer # Mn THAI CHARACTER THANTHAKHAT 0ECC ; Pure_Killer # Mn LAO CANCELLATION MARK 1A7C ; Pure_Killer # Mn TAI THAM SIGN KHUEN-LUE KARAN 1A7A ; Pure_Killer # Mn TAI THAM SIGN RA HAAM U+0E4C THAI CHARACTER THANTHAKHAT and U+0E4E THAI CHARACTER YAMAKKAN once divided the role of vowel killing - U+0E4E formed clusters and U+0E4C removed final vowels. The use of U+0E4C came to be largely restricted to vowels associated with clusters of consonants. Removing the vowel made the final consonant of the cluster silent (spoken Thai does not permit final consonant clusters), and from this effect it has been reinterpreted as a consonant-killer. U+0ECC probably had the same behaviour as U+0E4C. I don't know if it is still used in Laos - foreign loanwords often don't follow the rules. The Tai Tham marks are still at the transitional stage - they are sometimes found on final unsubscripted consonants to indicate that they have no vowel. There is an unfortunate overlap with the final consonant mark for > <r> (pronunciation necessarily /n/). The Khuen and Lue from of the final consonant symbol has the same shape as the Thai and Lao form of the pure killer. Consequently U+1A7A serves as Consonant_Final in Tai Khuen and Tai Lue. In Tai Khuen, at least, the use as a final consonant seems to have recently fallen into disfavour, so it seems most appropriate to classify U+1A7A as 'Pure_Killer'. I noted above that the 'Pure_Killer' U+0E3A THAI CHARACTER PHINTHU also serves as a nukta. I have a vague recollection that U+0E4C THAI CHARACTER THANTHAKHAT serves as a register mark in an orthography for the Chong language, so that would count as an auxiliary rôle as Tone_Mark. If 'Consonant_Killer' is to be separated from 'Pure_Killer', then we need a separate category 'Dual_Mode_Killer' for U+1A7A and U+1A7C. It should be noted that U+1A62 TAI THAM VOWEL SIGN MAI SAT serves not only as Vowel_Dependent but also as Consonant_Final. This seems to be chiefly relevant to anyone attempting to deduce the pronunciation from the spelling. There are 4 characters currently categorised as 'Consonant' which I think are better categorised as 'Vowel': 0E24 ; Vowel # Lo THAI CHARACTER RU 0E26 ; Vowel # Lo THAI CHARACTER LU 1A42 ; Vowel # Lo TAI THAM LETTER RUE 1A44 ; Vowel # Lo TAI THAM LETTER LUE They serve both as independent and dependent vowels. Note that U+0E24 and U+0E26 may be followed by the length mark U+0E45 THAI CHARACTER LAKKHANGYAO, which is categorised as 'Vowel_Dependent'. I am not aware of any usage of U+0E45 as a true vowel. The sequence > <U+1AAD TAI THAM SIGN CAANG, U+1A63 TAI THAM VOWEL SIGN AA> occurs with the same meaning, 'elephant', as U+1AAD. I don't know AA> whether this justifies changing U+1AAD from 'Other' to 'Consonant_Placeholder'. I've found one new Consonant: 0EBD ; Consonant # Lo LAO SEMIVOWEL SIGN NYO (was Consonant_Medial) 0EDE ; Consonant # Lo LAO LETTER KHMU GO (was Other) U+0EBD is used as an initial consonant in Khmu, so U+0EBD has been used in all rôles in the Lao script, like U+0EA7 LAO LETTER WO, which is of category Consonant. For information on Khmu usage, see UTC document L2/10-335 (http://www.unicode.org/L2/L2010/10335r-n3893r-lao-hosken.pdf). The Khmu alphabet chart included backs up the text. (It also shows U+0EC8 LAO TONE MAI EK acting as a Nukta!) If 'repha' can be used as a general category, including for example Myanmar script kinzi, then there are two arguable new examples, currently categorised as Consonant_Final: 1A58 ; Consonant_Preceding_Repha? # Mn TAI THAM SIGN MAI KANG LAI 1A5A ; Consonant_Succeeding_Repha? # Mn TAI THAM CONSONANT SIGN LOW PA There are significant issues with U+1A58; while traditionally it behaves as repha/kinzi, some modern styles are better served by treating it as Consonant_Final. It takes some juggling for a single OTL-style rendering engine to be able to render either style depending on the lookups while oblivious to the difference, but it can be done. I've found 5 new instances of Consonant_Subjoined: 1A57 ; Consonant_Subjoined # Mc TAI THAM CONSONANT SIGN LA TANG LAI 1A5B ; Consonant_Subjoined # Mn TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA 1A5C ; Consonant_Subjoined # Mn TAI THAM CONSONANT SIGN MA 1A5D ; Consonant_Subjoined # Mn TAI THAM TAI THAM CONSONANT SIGN BA 1A5E ; Consonant_Subjoined # Mn TAI THAM CONSONANT SIGN SA They were all previously categorised as Consonant_Final. Note that U+1A57 is an abbreviation. It is derived by the addition of a stroke to the subscript form > <U+1A60 TAI THAM SIGN SAKOT, U+1A43 TAI THAM LETTER LA> . Abbreviations of the word _tanglaai_ 'all' using U+1A57 normally include at least > <U+1A57, U+1A63 TAI THAM VOWEL SIGN AA> , so U+1A57 is not Consonant_Final. An example, apparently spelt > <U+1A26 TAI THAM LETTER NGA, U+1A57, U+1A76 TAI THAM SIGN TONE-2, U+1A63 TAI THAM VOWEL SIGN AA> , is given in Table 16 at http://www.seasite.niu.edu/tai/TaiLue/graphic%20blends.htm. The word ᨶᩥᨻᩛᩤᨶ > <U+1A36 TAI THAM LETTER NA, 1A65 TAI THAM VOWEL SIGN I, 1A3B TAI THAM LETTER LOW PA, 1A5B, 1A64 TAI THAM VOWEL SIGN TALL AA, 1A36> _nippa:na_ 'nirvana' immediately demonstrates that U+1A5B is not a final consonant. U+1A5C occurs in Pali proper names ending -mmo > <U+1A3E TAI THAM LETTER MA, U+1A5C, U+1A6E TAI THAM VOWEL SIGN E, U+1A63 TAI THAM VOWEL SIGN AA> , so is clearly not a final consonant. U+1A5D occurs in Northern Thai principally in one word, whose pronunciation is roughly /kɔbɔː/. U+1A5D is not Consonant_Final in its phonetic effect. The word is a compound word (or perhaps just a visual compound), formed by chaining two syllables and striking out the duplicated characters. I have a text in which the constituents are to be encoded > <U+1A20 TAI THAM LETTER HIGH KA, U+1A74 TAI THAM SIGN MAI KANG> and > <U+1A37 TAI THAM LETTER BA, U+1A74, U+1A75 TAI THAM SIGN KANG> TONE-1> , so the chained word may reasonably be encoded > <U+1A20, KANG> U+1A74, U+1A5D, U+1A75> or > <U+1A20, U+1A5D, U+1A74, U+1A75> . While all my examples of U+1A5E are word final, it seems to differ from > <U+1A60, U+1A48 TAI THAM LETTER HIGH SA> on the basis of the room available for it. Both forms are used as a word final consonant. The only Pali consonant cluster ending in /s/ is /ss/, and that is written using U+1A54 TAI THAM LETTER GREAT SA, so a non-final > <s> will be rare. (I'm finding /ks/ written with U+1A47 TAI THAM LETTER HIGH SSA due to the application of RUKI.) However, I feel it would be rash to presume that every example of U+1A5E will be a final consonant. I have one new Consonant_Final: 0EDF ; Consonant_Final # Lo LAO LETTER KHMU NYO (was Consonant) See UTC document L2/10-335 for evidence. I have one possible new Consonant_subjoined: 1A7B ; Consonant_subjoined # Mn TAI THAM SIGN MAI SAM The value of its Indic_Matra_Category, if relevant, should be recorded as Top. U+1A7B is principally a repetition mark, indicating the repetition of a word. As extensions of this role, it can also do at least the following: (1) Indicate a repeated (not geminate) consonant (2) Indicate an omitted implicit vowel (one omits an implicit vowel by replacing it with U+1A60) (3) Indicate an epenthetic vowel (extension of Role 2). In rôle (1), it serves as a subjoined consonant. In rôles (2) and (3), it serves as a dependent vowel. For a shaper that does not constrain appearance, such as the Universal Shaping Engine, the best categorisation is probably 'Consonant_subjoined'. Although U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA are named as medial consonants, too much should not be read into such a description. Both are, very occasionally, immediately preceded by vowels, and both may be followed by > <U+1A60 TAI THAM SIGN SAKOT, U+1A40 TAI THAM LETTER HIGH YA> and > <U+1A60, U+1A45 TAI THAM LETTER WA> . While the latter two sequences most commonly represent vowels, the strictly consonantal cluster > <U+1A49 TAI THAM LETTER HIGH HA, U+1A56, U+1A60, U+1A45> starts a few words beginning with the cluster /lw/. This is a behaviour the Universal Shaping Engine of Microsoft currently disallows for medial consonants. We should therefore have: 1A55 ; Consonant_Subjoined #Mc TAI THAM CONSONANT SIGN MEDIAL RA 1A56 ; Consonant_Subjoined #Mn TAI THAM CONSONANT SIGN MEDIAL LA I actually see no benefits for rendering engines in distinguishing Consonant_Medial and Consonant_Subjoined, though the contrast may help in locating phonetic syllable boundaries.
Date/Time: Tue Mar 10 21:55:16 CDT 2015
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #297 typo
NOTE: This was addressed by the editor on March 11, 2015 and will be in the next available draft.
The names list note for U+1DA8B SIGNWRITING PARENTHESIS mentions U+1DAA5 SIGNWRITING ROTATION MODIFIER-5. However, U+1DAA5 is SIGNWRITING ROTATION MODIFIER-6.
Date/Time: Fri Mar 13 12:57:27 CDT 2015
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #297: No! No! No!
The comments in IndicSyllabicCategory-8.0.0d3.txt claim that the general category of {SUPER,SUB}SCRIPT {TWO,THREE,FOUR} is Mn, but it is No.
Date/Time: Sat Mar 28 14:53:23 CDT 2015
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: Missing entries in 8.0 PropertyValueAliases.txt
Note: This issue was fixed in the data on March 12:
Indic_Syllabic_Category is missing these two values: Consonant_With_Stacker and Consonant_Prefixed
Date/Time: Tue Mar 31 16:14:37 CDT 2015
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Unicode 8.00 Beta - InSC
While the current candiadates for category Consonant_Succeeding_Repha may descend from repha, only one of them, U+17CC KHMER SIGN ROBAT is clearly still a repha. Reading the script descriptions in TUS makes it abundantly clear that U+1B03 BALINESE SIGN SURANG and U+A982 JAVANESE SIGN LAYAR are actually final consonants. The TUS also states that U+1B81 SUNDANESE SIGN PANGLAYAR is a final consonant, but without going into any details.
Date/Time: Thu Apr 9 00:09:31 CDT 2015
Name: R.S. Wihananto
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297
Indic Syllabic and Positional Category of U+A9BD JAVANESE CONSONANT SIGN KERET In Indic Syllabic Category data, U+A9BD JAVANESE CONSONANT SIGN KERET is categorized as 'Consonant_Subjoined'. This is incorrect. U+A9BD is not a subjoined form of any Javanese consonant. Historically, U+A9BD is a dependent vowel of vocalic r. Its counterpart in Balinese script is U+1B3A BALINESE VOWEL SIGN RA REPA; and U+1B3A is categorized as 'Vowel_Dependent'. In modern Javanese, U+A9BD is used as replacement for U+A9BF JAVANESE CONSONANT SIGN CAKRA (medial ra) if followed by U+A9BC JAVANESE VOWEL SIGN PEPET (vowel sign ĕ). So in modern Javanese U+A9BD is treated like a medial consonant sign. In books teaching about Javanese script, the three characters are always grouped together: medial ya (U+A9BE JAVANESE CONSONANT SIGN PENGKAL), medial ra (U+A9BF JAVANESE CONSONANT SIGN CAKRA), and medial rĕ (U+A9BD JAVANESE CONSONANT SIGN KERET). So U+A9BD in Indic Syllabic Category should be recategorized as 'Consonant_Medial' like U+A9BE and U+A9BF. However, unlike U+A9BE and U+A9BF, U+A9BD can't be followed by vowel signs because it already have inherent ĕ vowel. Also the Unicode Character Categories of this U+A9BD character is incorrect. It should not be categorized as 'Mc' (Mark, Spacing Combining), but 'Mn' (Mark, Nonspacing). This character is nonspacing and its behavior in combining with other character and forming ligature is similar to nonspacing vowel sign u (U+A9B8) and uu (U+A9B9). Its Balinese counterpart U+1B3A also has 'Mn' character category. So the Indic Positional Category of this character should also be corrected from 'Right' to 'Bottom'.
Date/Time: Thu Apr 9 00:22:32 CDT 2015
Name: R.S. Wihananto
Report Type: Error Report
Opt Subject: Public Review Issue #297
Positional Category of U+A9BE JAVANESE CONSONANT SIGN PENGKAL and U+A9BF JAVANESE CONSONANT SIGN CAKRA The positional category of U+A9BE JAVANESE CONSONANT SIGN PENGKAL should be corrected from 'Right' to 'Bottom_And_Right'. The positional category of U+A9BF JAVANESE CONSONANT SIGN CAKRA should be corrected from 'Right' to 'Bottom_And_Left'; but I can't find this category in the Indic Positional Category data. This character is similar to U+103C MYANMAR CONSONANT SIGN MEDIAL RA. U+103C is not found/categorized in the Indic Positional Category data.
Date/Time: Thu Apr 9 01:14:22 CDT 2015
Name: R.S. Wihananto
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297
Indic Syllabic Category of U+1B03, U+1B81, and U+A982 I agree with Mr. Richard Wordingham's feedback. U+1B03 BALINESE SIGN SURANG, U+1B81 SUNDANESE SIGN PANGLAYAR, and U+A982 SIGN LAYAR all were historically repha; but in modern writings, these characters are final -r consonant sign. Because Balinese, Sundanese, and Javanese characters are encoded with logical order, I think categorized these characters for visually ordered repha is wrong. For consistency with other Indic scripts that only use repha in older texts (such as Telugu), old repha in Balinese, Sundanese, and Javanese should also be encoded with RA + VIRAMA + ZWJ.
Date/Time: Mon Apr 13 20:44:21 CDT 2015
Name: Shreevatsa R
Report Type: Error Report
Opt Subject: Ambiguity in Chapter 12.1 Devanagari, section Encoding Principles
The section says (http://www.unicode.org/versions/Unicode7.0.0/ch12.pdf): "The orthographic syllable is built up of alphabetic pieces, the actual letters of the Devanagari script. These pieces consist of three distinct character types: consonant letters, independent vowels, and dependent vowel signs. In a text sequence, these characters are stored in logical (phonetic) order. Consonant letters by themselves constitute a CV unit, where the V is an inherent vowel, whose exact phonetic value may vary by writing system. Independent vowels also constitute a CV unit, where the C is considered to be null. A dependent vowel sign is used to represent a V in CV units where V is not the inherent vowel." To be clear, the last sentence should read: A dependent vowel sign is used to represent a V in CV units where V is not the inherent vowel **and C is not null**. Because otherwise, it's confusing to say in one sentence that an independent vowel is a CV unit, and in the next sentence say that in CV units a dependent vowel sign is used. Obviously, in independent vowels (which are CV units) no dependent vowel sign is used.
Date/Time: Mon Apr 13 20:49:36 CDT 2015
Name: Shreevatsa R
Report Type: Error Report
Opt Subject: Ambiguity in Chapter 12.1 Devanagari, section Principles of the Devanagari script
The Unicode standard says (http://www.unicode.org/versions/Unicode7.0.0/ch12.pdf): "Consonant letters may also be rendered as half-forms, which are presentation forms used to depict the initial consonant in consonant clusters" -- here, "initial consonant" should be "non-final consonant" (or "consonants other than the last one").
Date/Time: Tue Apr 21 16:39:07 CDT 2015
Name: Markus Scherer
Report Type: Error Report
Opt Subject: U+9730 霰 pinyin is not "sǎn"
Note: This report has already been sent to the Unihan experts for evaluation.
We received a bug report about the pinyin sort order for 霰. In the CLDR Chinese pinyin order, which is based on Unihan data, it sorts with "S" but should sort with "X". http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E9%9C%B0 shows: Readings Data type Value kCantonese sin3 kDefinition hail, sleet kHangul 산 kHanyuPinyin 64076.140:xiàn kJapaneseKun ARARE kJapaneseOn SAN SEN kKorean SEN SAN kMandarin sǎn kTang sèn kXHC1983 0986.010:sǎn 1250.050:xiàn The CLDR data is generated by a tool that prefers kMandarin=sǎn over kHanyuPinyin=xiàn, because kMandarin is "The most customary pinyin reading for this character." (http://www.unicode.org/reports/tr38/index.html#kMandarin) Feedback from native Chinese colleagues indicates that if they recognize the character, they know it is unambiguously "xiàn". If they do not recognize it, they guess "sǎn" based on the more common 散. If I understand correctly, this means that kMandarin=sǎn is incorrect. Please fix, and let me and Mark know the resolution. References: Xinhua Dictionary http://zh.wikipedia.org/wiki/%E9%9C%B0 http://www.zdic.net/z/27/js/9730.htm http://zidian.odict.net/862078860/
Date/Time: Thu Apr 23 19:12:14 CDT 2015
Name: Nick Lawson
Report Type: Public Review Issue
Opt Subject: Cheese and Bacon Emojis
Dear Unicode Consortium, First off, I'd like to congratulate you on the release of Unicode 7.0. I love the diverse skin tones available for characters and hand signals, as well as the variety of new characters that were added. I noticed that 'Cheese Wedge' is a proposed Emoji for the Unicode 8.0 update, and I could not be more excited. I have, on several occasions, wished that there was a cheese-related Emoji available. I would like to strongly advocate for its inclusion, and thank you for your consideration of the community's suggestions for characters. I was disappointed, however, to see that a 'Bacon' Emoji was not slated for the 8.0 update. While I understand that Unicode is universal and bacon is not necessarily ubiquitous across cultures, bacon has become a trend in the culture of the western world. Bacon is rising in popularity in fast food items as well as high-end dining, and its popularity extends beyond cuisine. There are bacon-themed clothes, bacon-scented toiletries, and bacon-related housewares that are quite popular (although I can't say they're quite my taste). It seems fitting that Emojis, which are also becoming a hot cultural trend, would include a bacon unicode character. The inclusion of 'Cheese Wedge' and 'Bacon' unicode characters in the 8.0 update would make me absolutely ecstatic. If there is a specific party or group I should contact with these suggestions, please let me know. And if you would like suggestions or examples of art work, I would be happy to contribute those with no expectations of credit. Thank you very much for your kind consideration, and I look forward to hearing from you soon. Sincerely, Nick Lawson
Date/Time: Sun Apr 26 22:58:22 CDT 2015
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Confusion about meaning of White_Space property
There is confusion in the standard and the FAQ about the meaning of the White_Space property. As I read the standard, the only official definition is given in UAX #44: "White_Space: [...] Spaces, separator characters and other control characters which should be treated by programming languages as "white space" for the purpose of parsing elements." Basically, no relation to display. But the Core Spec, in page 250, says: "Line separation characters, such as the carriage return, do not clearly exhibit their advance width, because they always occur at the end of a line, but most implementations give them a visible advance width when they are selected. Hence, they are classed together with space characters; both are given the White_Space property. Whitespace characters are not considered to be ignored for display." This is contradicting the definition in UAX #44, where it says the property is not about displaying things, but parsing things. The FAQ also confuses the property's meaning. At http://unicode.org/faq/unsup_char.html#2, it says: Q: Which characters should be displayed as a visible but blank space? A: This is the easy one: all the characters that have the White_Space property, also generically known as “whitespace characters”. This set includes SPACE, of course, but also such characters as the tab control character, NO- BREAK SPACE, LINE SEPARATOR, and so on. For the full list, see the White_Space values in PropList.txt.
Date/Time: Mon Apr 27 13:49:24 CDT 2015
Name: Sebastian Kempgen
Report Type: Public Review Issue
Opt Subject: LATIN SMALL LETTER SAKHA YAT
Note: This has been fixed in the master data file.
Hello, in the beta code chart of UC 8.0, there is a note below character AB60, that points to A657 "Cyrillic small letter iotified a" as the source for this letter. This is not correct. The new letter AB60 is simply a glyph variant of 0463, Cyrillic Small letter yat. (And indeed, the code chart for 8.0 has the correct back-reference to AB60 at 0463.) Because AB60 and 0463 do look completely diffent, one bit of background might help: the glyph at AB60 is simply the *upright* variant of a glyph variation more commonly found for the *cursive* form of 0463. Even some of today's fonts have that glyph variation which in its cursive form looks like a Latin cursive "n" with a cyrillic soft sign tacked onto is right side. Best regards, Sebastian Kempgen
Date/Time: Wed May 6 14:13:03 CDT 2015
Name: Tim Larson
Report Type: Public Review Issue
Opt Subject: 8.0 beta - menorah addition
The beta code chart for U+1F54E is missing the note "Hanukiah" that had been previously added some time prior to Nov 19 2014. Please re-add it.
Date/Time: Wed Mar 25 12:52:19 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297
Bidi-mirroring of mathematical symbols Hi I would like to send you my previous suggestion with my new email address, to replace the former post, which is poorly written too, I'm sorry. There may be a problem with bidi-mirroring of mathematical symbols. U+2260 and U+2262 are bidi-mirrored, U+226D is not. So probably U+226D should be. But it seems implementers don’t care about bidi-mirroring of mathematical symbols, except common ones, as U+003C and U+003E. When these are negated, U+226E and U+226F, there is no more bidi-mirroring in Windows NotePad. This may be why U+226D is not bidi-mirrored in Unicode. But it seems inconsistent. Consistently with U+2215 / U+29F5, bidi-mirroring of U+2260 and U+2262 means a backslash for negation when script is right-to-left. Making it a rule, mathematical slashes, even U+2298, convert to backslashes when bidi-mirrored (I hope I’m right). More generally, there is a need to inform the readers of the Code Charts which characters are bidi-mirrored and which ones are not. For that purpose, bidi- mirroring should be indicated systematically. This is as important as casing and glyph shape informations. Given the amount of information made available in the Code Charts and NamesList, bidi-mirroring should not be confined to an implementation issue and therefore, the bidi-mirroring information should be available inside the Code Charts, not only in UnicodeData.txt and additional files. Because NamesList translators are free to add comments, I’m adding bidi- mirroring comment lines at each character or subhead that is concerned with this issue. This is properly a French NamesList translation issue. Not showing bidi-mirroring in the Code Charts, might be interpreted as missing respectfulness against right-to-left scripts, therefore right-to-left script users may be worried about. Bidi-mirroring is so important it should not be searched for in UnicodeData.txt, BidiBrackets.txt and BidiMirroring.txt, but shown straight in the Code Charts. Best regards, Marcel Schneider
Date/Time: Wed Mar 25 12:54:45 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297
Notice line vs Comment line Hi, there are “idle” “@+” in the NamesList, which in the Code Charts draft have no effect on markup or highlighting in any way, but disturbed the results of a sorting (spreadsheet) formula with the NamesList. (In fact, I used the French translation, where many idle “@+” are missing, and added them following the NamesList-8.0.0d6.) There seems to be no difference between such a NOTICE_LINE with bullet applying to a character, and a COMMENT_LINE. Therefore, I suggest to convert these NOTICE_LINEs to COMMENT_LINEs. The list below shows the instances in the NamesList where “idle @+” occurs: U+0140 U+0149 U+01A6 U+0268 U+0269 U+0277 U+027C U+029E U+0307 U+01E7 U+1E5B U+2301 U+234A U+237B U+237D U+237E U+237F U+2425 U+2426 U+16F27 U+16F32 U+16F52 U+16F53 Best regards, Marcel Schneider
Date/Time: Wed Apr 1 09:43:56 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Code points with bookmarks in the All-in-one Code Charts PDF Hello, I would suggest adding the first code point before the Blockheads that display in the side pane on Adobe Reader when viewing the All Code Charts PDF. The Blockheads are very useful, but often a given code point must be searched for. Then showing the Start point together with the Blockhead will help finding glyphs quickly. It would also be useful, instead, to show the startpoints parenthesized after the Blockheads, if the goal is to avoid puzzling users with figures. This will be the reason why actually the code point searching readers can click at an estimated (or learned) position and then control the range looking at the page header. Using the single code charts instead, presents some disadvantages: — Most archive managing software cannot sort on hex values, so the code charts in a folder are unordered. — For convenience, there is neither endpoint nor blockhead in the single charts filenames (they are opened with code point searching software). Best regards, Marcel Schneider
Date/Time: Sat Apr 11 09:09:00 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Head numbering in the Standard, refs to the Standard in the Code Charts Hi, it is sometimes hard to quote and to retrieve an instance in the Standard because there are only two numbering levels: chapters and sections. For example, the long section 6.2 is structured with many unnumbered heads. Therefore I suggest adding a third numbering level. With these additional numbers, quoting would be facilitated, and when they appear in the navigation pane, retrieving would be too. Numbering levels have been restricted to two in order to avoid puzzling with too much figures. IMO the more the Standard is quoted, the more it will become popular and well-known. This is why I suggest that comments in the Code Charts should point to the Standard where appropriate, showing “§9.9.9” refs, for example at U+0029: “* see discussion on semantics of paired bracketing characters” might become: “* see discussion on semantics of paired bracketing characters, chapter 6 of the Standard, section 2.10” or just: “* see discussion on semantics of paired bracketing characters, §6.2.10”. As well, bidi-mirroring may be referred to as “bidi-mirrored (see §4.7)” at a significant number of instances in the Code Charts (and the NamesList). UAXes, other UTNs and UTRs might be quoted too, where this would be helpful to get started with Unicode. Best regards, Marcel Schneider
Date/Time: Mon Apr 13 10:05:09 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Bidi-mirrored Code Charts Hi, given that bidi-mirroring of symbols is a very complex issue (see U+2260 and U+2262 as well as U+2298 that are bidi-mirrored, vs U+226D and U+2205 that are actually not), it might be useful to have bidi-mirrored Code Charts. They would facilitate the implementation of Unicode for right-to-left scripts. Today, users who aim at getting started with Unicode, must guess what bidi- mirrored symbols look like whenever there is no Bidi_Mirroring_Glyph for a ready-to-use bidi-mirroring emulation. Therefore I suggest adding bidi- mirrored Code Charts where appropriate. Every block’s charts that contain bidi-mirrored characters would be followed by a set of Code Charts where these characters are bidi-mirrored and highlighted in some way, perhaps with the abbreviation “BM” in the upper left corner. Best regards.
Date/Time: Mon Apr 13 10:05:31 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
A new extended datafile to speed up implementation Hi, since many data came to join up UnicodeData for a complete character property repertoire, it might be helpful to create a new overall datafile on the pattern of UnicodeData, including more fields such as FormalAlias, BidiMirrored, and some spare fields, in order to enhance transparency and promote more complete implementations than are often seen even today. The question that is to be asked, is whether the hints Unicode gives implementers are well understood. For example, Unicode deprecates parsing the NamesList for machine-readable information, and nevertheless it is the NamesList that is parsed for example by keyboard creating software, while often no notice is taken of the FormalAliases. I take notice of the policy of adding new files rather than new fields to UnicodeData. But perhaps it would be now time to add a new comprehensive base file, including much of the information of NamesList and all the other files of UCD. Best regards.
Date/Time: Mon Apr 13 10:07:22 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Chapter 6 of the Standard Hi, I would like to suggest adding some information to the second paragraph of the “Apostrophes” subsection in section 6.2 “General Punctuation” of the Standard, on page 272. The paragraph is pasted below, and the suggested information is added within, highlighted with underscores. The first statement is based on 87 klc sources of latin keyboard layouts shipped with Windows 7 Starter. The docx is attached below. Please note the Canadian multilingual standard keyboard layout klc source is uncomplete because Kana is not recognized by the software. It adds as eleventh keyboard with U+2019 among a total of 87 keyboards, making the exact percentage end up at 87% locale latin Windows keyboards without U+2019. Even while the US Standard keyboard layout does not contain U+2019, the US International keyboard layout, however, does, along with U+2018. The second statement resumes an article found with a search engine at the following URL: http://www.newrepublic.com/article/113101/smart-quotes-are- killing-apostrophe The third statement results from observation using a small panel of free ornamental fonts, most handwriting. Best regards. P.S.: I will send the docx by mail after this form __________________________ When text is set, U+2019 right single quotation mark is preferred as apostrophe, but _it is missing on about 90% of most current locale latin keyboards, where_ only U+0027 is present__. Software commonly offers a facility for automatically converting the U+0027 apostrophe to a contextually selected curly quotation glyph. _This facility uses to fail when U+0027 represents a leading apostrophe, not an opening quotation mark._ In these systems, a U+0027 in the data stream is always represented as a straight vertical line and can never represent a curly apostrophe or a right quotation mark. _However, many ornamental fonts associate the same curly glyph to U+0027 as to U+2019._ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Mon Apr 13 10:07:49 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Chapter 6 of the Standard Hi, there is some additional information about U+2044 FRACTION SLASH I would suggest adding at the “Fraction Slash” paragraphs in the “Other Punctuation” subsection of §6.2, page 273 of the Standard, as well as in the Code Charts’ Fractions subheader before U+2150. U+2044 FRACTION SLASH working together with superscripts and subscripts is so obvious no discussion is needed. On the other hand, as fraction formatting needs at least desktop publishing software, it is usually not a part of office automation. It seems therefore useful to show the plain text entering method for fractions with a slanted fraction slash like the default glyph of U+2044. The Number Forms block’s Fractions subhead may therefore be followed by a NOTICE_LINE like this one: ‘@+’ [TAB] [TAB] ‘Fractions may be composed in plain text on a [superscripts] 2044 [subscripts] pattern.’ Meanwhile, the Fraction Slash notice in the Standard might contain the informations below (including those already provided in the Standard). Best regards. ___________________________ Fraction Slash. U+2044 FRACTION SLASH is used between digits to form numeric fractions. It is kerning for use with superscripts and subscripts to compose plain text fractions such as ²⁄₃ and ³⁄₉.The pattern of a plain text fraction built using the fraction slash is defined as follows: any sequence of one or more superscript digits (U+00B9, U+00B2, U+00B3, U+2074 - U+2079, U+2070), followed by the fraction slash, followed by any sequence of one or more subscript digits (U+2080 - U+2089). U+2044 FRACTION SLASH may also act as a formatting command for use with decimal digits, and it may be used instead of U+002F SOLIDUS prior to applying fraction formatting. The standard form of a fraction designed for formatting is defined as follows: any sequence of one or more decimal digits (General Category = Nd), followed by the fraction slash, followed by any sequence of one or more decimal digits. If the fraction is to be separated from a previous number, then a space can be used, choosing the appropriate width (normal, thin, zero width, and so on). For example, 1 + thin space + 3 + fraction slash + 4 can be displayed as 1¾. Whether they are plain text or formatted, fractions should be displayed as a unit, such as ¾ or {unavailable glyph}. The precise choice of display can depend on additional formatting information. If the displaying software is incapable of mapping the fraction to a unit, then it can also be displayed as a simple linear sequence as a fallback (for example, 3/4). For fallback display, U+002F SOLIDUS is preferred, because the fraction slash kerns. ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Tue Apr 14 08:14:20 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Chapter 6 of the Standard Hello, in section 6.2, on page 268 of the Standard, Quotation Marks and Brackets, I suggest moving the last sentence of the second paragraph to the end of the first paragraph. The result would look as quoted below, where the move is highlighted with underscores. Best regards. ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Quotation Marks and Brackets. Like brackets, quotation marks occur in pairs, with some overlap in usage and semantics between these two types of punctuation marks. For example, some of the CJK quotation marks resemble brackets in appearance, and they are often used when brackets would be used in non-CJK text. Similarly, both single and double guillemets may be treated more like brackets than quotation marks. __Unlike brackets, quotation marks are not mirrored in a bidirectional context.__ Some of the editing marks used in annotated editions of scholarly texts exhibit features of both quotation marks and brackets. The particular convention employed by the editors determines whether editing marks are used in pairs, which editing marks form a pair, and which is the opening character.____ ____________________________
Date/Time: Tue Apr 14 08:15:17 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Chapter 6 of the Standard Hello, in section 6.2, on page 268, Language-Based Usage of Quotation Marks, it may be possible to add some useful content to the two first paragraphs. They are quoted below (the second is split), and changes are highlighted with underscores. The first modification is needed to support the fact that U+0022 is optionally converted to chevrons (guillemets). The meaning of the “warning” quotation marks is to prevent the reader from taking the expression in plain sense. Following French usage they may be called “irony quotes”, but often there is no irony at all, just the meaning of “so-called”. Best regards. ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Language-Based Usage of Quotation Marks U+0022 QUOTATION MARK is the most commonly used character for quotation mark. However, it has ambiguous semantics and direction. Most keyboard layouts support only U+0022 QUOTATION MARK, but software commonly offers a facility for automatically converting the U+0022 QUOTATION MARK to a contextually selected _quotation mark_ glyph. European Usage. The use of quotation marks differs systematically by language and by medium. In European typography, it is common to use _angle quotation marks (guillemets, chevrons) in publishing_ and, except for some languages, curly quotation marks in office automation. Single guillemets may be used _to clarify the presence of nested quotations_. _Many authors use angle and curly quotation marks in the same text to distinguish between quoting and warning._ The following description does not attempt to be complete, but intends to document a range of known usages of quotation mark characters. Some of these usages are also illustrated in Figure 6-3. In this section, the words single and double are omitted from character names where there is no conflict or both are meant. ___________________________
Date/Time: Tue Apr 14 08:17:32 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
NamesList, U+0022 and U+0027 Hello, in the NamesList and the Code Charts, at U+0022, the COMMENT_LINE “* neutral (vertical), used as opening or closing quotation mark” should be replaced with its counterpart at U+0027: “* neutral (vertical) glyph with mixed usage”, because U+0022 is also used for “seconds”, “double prime” and “ditto”. Furthermore, it is doubtful whether locale language-specific support is appropriate here. Therefore, I suggest replacing “* preferred characters in English for paired quotation marks are 201C & 201D”, because on one side, this is true for other languages (e.g. French), and on the other side, there are ways to support far more languages here, with something like “* some preferred characters for paired double quotation marks are found at 201C-201E”. The next step would eventually be to remove “* 05F4 is preferred for gershayim when writing Hebrew”, and to rely on the CROSS_REF “x (hebrew punctuation gershayim - 05F4)”. I've learned some Hebrew and I like it very much, also the Jewish nation, of course, but I fear this COMMENT_LINE brings some risk of conflict. IMO Hebrew language support is ensured thanks to the already provided CROSS_REF “x (hebrew punctuation gershayim - 05F4)”. If the above statements are right, they would identically apply to U+0027, third and fourth COMMENT_LINE. In any case, as shown in UTN #24, an English translation is needed (the English don’t call a slash a SOLIDUS, while they call a FULL-STOP a period, too). In this English translation (ideally for use in both the United States and the United Kingdom and all English speaking or using countries), the COMMENT_LINE “* preferred characters in English for paired quotation marks are 201C & 201D” will be really appropriate. Best regards.
Date/Time: Wed Apr 15 10:55:39 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297
NamesList: U+05F3, U+05F4 Hello, joint to my proposal for removing the Hebrew support COMMENT_LINEs at U+0022 and U+0027, leaving the CROSS_REFs only, I suggest adding appropriate COMMENT_LINEs in the Hebrew block, at U+05F3 and U+05F4, taking model on those at U+2018 and U+201C: @ Additional punctuation 05F3 HEBREW PUNCTUATION GERESH * this is the preferred character (as opposed to 0027) x (apostrophe - 0027) 05F4 HEBREW PUNCTUATION GERSHAYIM * this is the preferred character (as opposed to 0022) x (quotation mark - 0022) Best regards.
Date/Time: Fri Apr 17 11:45:39 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Chapter 6 of the Standard Hi, in section 6.2, in the subsection “Dashes and Hyphens” on page 265, I suggest to enhance the information about U+2010 HYPHEN. In most fonts where it is present, it does not differ in appearance from U+002D HYPHEN-MINUS. Hence, in many fonts, whether current or ornamental, it is missing. There is at least one good font where the statements in the Standard apply, because U+002D is rendered with a tiny wide glyph that is not convenient as a hyphen. Therefore, entering U+2010 as default hyphen in raw text does not seem to be appropriate. When it is preferred in a given layout, a routine can put it at the place of U+002D, the minus sign U+2212 MINUS SIGN being entered expressedly to disambiguate the two semantic values of hyphenation and minus. In practice, U+2010 seems to be seldom used. Even in the Standard, typesetted in Minion-Regular, the hyphen character is U+002D, as found in the sample “left-to-right” in this paragraph. I’ve tried to put U+2010 on the keyboard as default hyphen in typesetting mode, but the problems lead me to reset the character to U+002D. I make U+2010 available in the dead key registry (acute, hyphen), mainly to facilitate the search-and-replace settings. As a result, the sentence “It is rendered with a narrow width.” should be completed in some way, because in nearly all fonts, U+002D is rendered with a narrow width too. (I would not mention MS Gothic where U+2010 displays with an extra space around!) Best regards.
Date/Time: Fri Apr 17 12:16:45 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
NamesList and the Code Charts Hi, there seems to be a mistake with character names. In fact they are designations, and they are handled a such. The goal of a character’s name is to give an accurate idea of what the character is, and to facilitate referring to in natural language. As an immutable identifier there is the code point. Systems handle code points, not character names. Software does not need any other identifier. This is why freezing character names is an abuse, especially when they proved to be wrong. There is a very strong desire to design most accurate names, which lead to passionate discussions at the merger of ISO/IEC 10646 with Unicode. But the renaming of U+00C6/U+00E6 to its original letter status produced surprisingly a name-update prohibition act, a Stability Policy that extends over names instead of ensuring code point stability only. Suddenly, character names were called by ISO “convenient identifiers”, not more. And not less. Fortunately Unicode found a workaround, giving characters that are completely misnamed, a Formal Alias, thanks to which Formal Alias aware software is able to display a true designation in most cases. Unfortunately, the remedy is not applied to characters such as U+002F SOLIDUS, a slash that bears the scholar name of the fraction slash (U+2044 FRACTION SLASH may be called with some reason a solidus). And even more unfortunately, there would be fare too many Formal Aliases if all the abusive lateralization of bidi-mirrored paired punctuations would be corrected. Even out of bidirectional context, the “LEFT” qualifier is unfitting for U+2018 and U+201C in a Universal Character Set. UnicodeData shows clearly where most of the awkward names are from. Or, more accurately, where they are NOT from. By misnaming characters in an ethnocentric way, ISO acted against its mission as an international standards body. It is obvious an international organization for standardization must respect its members’ wishes. And when one of the countries complains about misnaming, it must correct and apologize, not rage and protest. Nor prohibit further updates. Therefore I suggest doing some general overhaul. Beginning with the Stability Policy. As to avoid lateralization where it is undue, LEFT and RIGHT may be replaced with the original OPENING and CLOSING where it is unambiguous, or with BACKWARD-POINTING and FORWARD-POINTING. Best regards.
Date/Time: Mon Apr 20 03:02:12 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
U+00BF INVERTED QUESTION MARK U+00A1 INVERTED EXCLAMATION MARK The turned question mark (INVERTED QUESTION MARK) seems not to be used in Catalan. Therefore it is mentioned for Castilian in a translation. Would it be accurate to replace “* Spanish” with “* Castilian”? The same would then probably apply to U+00A1, with as already shown in the NamesList, Asturian and Galician (perhaps also at U+00BF?). Best regards, Marcel Schneider
Date/Time: Mon Apr 20 03:08:15 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
U+00D0, U+00F0 LATIN SMALL LETTER ETH; U+0110, U+0111 LATIN SMALL LETTER D WITH STROKE As in this block, uppercase and lowercase are at 32 code points from each other, there would normally be no need of mentioning. The Icelandic LATIN LETTER ETH may differ from the rule because of the risk of confusing it with U+0110/U+0111 LATIN LETTER D WITH STROKE (which lead early standards bodies to encode the capital once only!). But ordinarily the NamesList uses COMMENT_LINEs, not CROSS_REFs, for casing information. Therefore I suggest replacing x (latin small letter eth - 00F0) and x (latin capital letter eth - 00D0) with * lowercase is 00F0 and * uppercase is 00D0 The same would then apply to the LATIN LETTER D WITH STROKE, because as in this block, lowercase follows uppercase, it was only the risk of confusion with U+00D0/U+00F0 that lead Unicode to add the CROSS_REFs x (latin small letter d with stroke - 0111) and x (latin capital letter d with stroke - 0110) However, as shown above, * lowercase is 0111) and * uppercase is 0110 COMMENT_LINEs seem to fit better this particular context. Even if at this instance, they would only ensure all concerned languages are treated equally. Best regards, Marcel Schneider
Date/Time: Mon Apr 20 03:09:02 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
NamesList and the Code Charts If there is no way of reengineering character names stability, that is, if the convenience of the relatively little number of involved specialists primes over the billions of end-users who will stay a long time dealing with code names while nearly no translations are provided, then I suggest making an extensive use of the correcting facility Unicode created with FormalAliases. To perform this, at least all characters that are bidi-mirrorred and whose names are ethnocentric, should be provided a FormalAlias. For example: U+00AB LEFT DOUBLE ANGLE QUOTATION MARK, which Unicode first named LEFT POINTING GUILLEMET, may be given the FormalAlias % BACKWARD-POINTING GUILLEMET or BACKWARD-POINTING DOUBLE ANGLE QUOTATION MARK. U+00BB RIGHT DOUBLE ANGLE QUOTATION MARK, which Unicode first named RIGHT POINTING GUILLEMET, may be given the FormalAlias % FORWARD-POINTING GUILLEMET or FORWARD-POINTING DOUBLE ANGLE QUOTATION MARK. A warning should be placed in the file header, to point out that all character names that are followed by a FormalAlias, should be discarded at use, and the FormalAlias is strongly recommended to be used instead. The next question is whether it would not be sufficient, if there is a Formal Alias, to show the code name on a CODENAME_LINE. This would allow to shift the Formal Aliases on the NAME_LINE. Subsequently, the (relatively) few specialists who are dealing with standardization may be asked to refer always to the CodeName wherever there is one. Even simplier, the roles of CharacterName and FormalAlias may be inverted at these instances, giving the Formal Alias a Code Name status (and the Character Name a True Designation status). Then it would be enough to disable the FormalAlias-awareness-algorithms (which do most likely not even exist already in end-user software, at least not in some relatively widespread free keyboard layout creating software for end-users). Best regards, Marcel Schneider
Date/Time: Mon Apr 20 03:09:33 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
U+002D HYPHEN-MINUS As the ALIAS_LINE of this character collapses nearly with its COMMENT_LINE, it would be nice to ventilate the aliases on several lines as it is found at U+223C TILDE OPERATOR, in order to separate the two very different semantic values. Instead of a “hyphen or minus sign” alias, there would be two, like: = hyphen = minus sign Furthermore, after the COMMENT_LINE * used for either hyphen or minus sign it will be useful to add another one, because U+002D does not match neither figures nor the other operators as U+002B PLUS SIGN, so using it as a minus sign is very bad typography. This COMMENT_LINE might show the following information: * 2212 is preferred for minus sign It follows the one existing already for U+0027 APOSTROPHE (“* 2019 is preferred for apostrophe”). Indeed, IMO the two cases are roughly similar. Best regards, Marcel Schneider
Date/Time: Mon Apr 20 03:10:12 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Numeric Separators In the Standard, §6.2, page 275, a brief mention of Numeric Separators is found. It consists merely in an enumeration of five characters or code points and refers to the existence of locale and user customizations. While one of them is the ARABIC THOUSANDS SEPARATOR U+066C, no mention is made in the ASCII block of such a semantic value for period or comma. Nevertheless, the French translation adds “thousands separator” among the aliases of U+002E FULL STOP. IMO the related semantic values are so important these characters should be shifted into the subset of well commented characters in the Code Charts, and the related subject might become more extensively covered in the Standard. Therefore adding some information might be useful, perhaps as suggested below. Best regards, Marcel Schneider 1) In the NamesList: 0027 APOSTROPHE Add a third ALIAS_LINE: = prime, minutes, feet, thousands separator Among the CROSS_REFs: add x (arabic thousands separator - 066C) 002C COMMA On the ALIAS_LINE: add “thousands separator” after “= decimal separator”. Among the CROSS_REFs: add x (arabic decimal separator - 066B) 002E FULL STOP Split the ALIAS_LINE and add “thousands separator” after “= decimal point”, like this: = period, dot = decimal point, thousands separator Add a COMMENT_LINE as this: * used as a thousands separator when 002C is decimal separator, and conversely and place it first. 2019 RIGHT SINGLE QUOTATION MARK Add a second COMMENT_LINE: * may be used as a thousands separator Among the CROSS_REFs: add x (arabic thousands separator - 066C) 2) In the Standard, on page 275, Numeric Separators, a supplemental text like the following might be added at the end of the paragraph: ___________________________ In latin usage for example, U+002C COMMA and U+002E FULL STOP may both take the semantic of a decimal separator. The one that is not given this value is then currently used as a thousands separator. Alternatively, space characters or raised separators like U+0027 APOSTROPHE may act in this way. The latter is current Arabic usage, where U+066C is a dedicated ARABIC THOUSANDS SEPARATOR, while 066B is a special ARABIC DECIMAL SEPARATOR, even if 060C ARABIC COMMA is used for the same purpose. ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Mon Apr 20 11:06:54 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
0598 HEBREW ACCENT ZARQA 05AE HEBREW ACCENT ZINOR I cannot measure the impact of this old and welldocumented (UTN #27) problem, and there are surely some good reasons to keep Formal Aliases minimal. Personally I would prefer there were some here, so I don’t keep away from making the following suggestion: 0598 HEBREW ACCENT ZARQA % HEBREW ACCENT TSINORIT * the Tsinorit is also used alternatively to place Zarqa or Tsinor above, following a printing preference * character name is a misnomer 05AE HEBREW ACCENT ZINOR % HEBREW ACCENT TSINOR * this character is used to place Zarqa or Tsinor regularly above left These FormalAliases would have three advantages: — They are unique (U+05AE cannot be given the FormalAlias ZARQA because this is already taken). — They are homogenous (both are called following the usage in the book of Psalms and the other poetic books). — There is a strong coherence between names (the one is a diminutive of the other) and practice (the one may be used alternatively instead of the other, following a printing preference). Best regards, Marcel Schneider
Date/Time: Mon Apr 20 11:13:36 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
00DF LATIN SMALL LETTER SHARP S 1E9E LATIN CAPITAL LETTER SHARP S While users are glad this letter has not been called a ligature (because it isn’t), there is some reason to complain this letter has been called a sharp s, because it isn’t neither (even if it was already called so in early standards). The German sharp s does really exist as a phoneme, but it is represented by a double s, not an ß. Worse, in Germany the original version of the Standard is used, no translation, so native users do really complain. With respect to font designers, Unicode has already well developed the glyph comment. Now it would be nice to add the FormalAliases to these characters. Furthermore, the comment “* uppercase is "SS"” should be avoided in this form because it reminds an abbreviation that was used in Germany before and during WWII. There is also an on-going change that leads to prefer U+1E9E for uppercase. The lines, as I suggest them, would therefore end up as follows: 00DF LATIN SMALL LETTER SHARP S % LATIN SMALL LETTER SZ = Eszett * German * character name is an old misnomer * uppercase is two times U+0053, but tends to be 1E9E [...] 1E9E LATIN CAPITAL LETTER SHARP S % LATIN CAPITAL LETTER SZ * character name results from a misnomer * lowercase is 00DF And I would suppress in both cases the crossrefs “x (latin capital letter sharp s - 1E9E)” and “x (latin small letter sharp s - 00DF)”, according to the rule that in the Code Charts, special casing information is provided with comments only. Best regards, Marcel Schneider
Date/Time: Mon Apr 20 11:15:57 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
00FF LATIN SMALL LETTER Y WITH DIAERESIS 0178 LATIN CAPITAL LETTER Y WITH DIAERESIS As already reported for U+00D0/U+00F0, U+0110/U+0111, and U+00DF/U+1E9E, the crossrefs for casing at U+00FF/U+0178 may consistently be replaced with comments: * uppercase is 0178 * lowercase is 00FF There seem to be no more instances where this suggestion would apply. Best regards, Marcel Schneider
Date/Time: Tue Apr 21 10:16:42 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 LATIN LETTER AE
The versioning of the alias names at U+00C6 and U+00E6 should correctly be "(1.1)". This results from matching the NamesList-8.0.0d6.txt with UnicodeData-8.0.0d8.txt. The NamesList shows: 00C6 LATIN CAPITAL LETTER AE = latin capital ligature ae (1.0) 00E6 LATIN SMALL LETTER AE = latin small ligature ae (1.0) Therewhile, UnicodeData shows: 00C6;LATIN CAPITAL LETTER AE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER A E;;;00E6; 00E6;LATIN SMALL LETTER AE;Ll;0;L;;;;;N;LATIN SMALL LETTER A E;;00C6;;00C6 Field 11 in UnicodeData is the Unicode 1.0 name. For these instances, Unicode is reported to have been forced by ISO to abandon the 1.0 names and to put ISO names in their place, prior to issuing the 1.1 version of the Standard, that is, the merged Unicode + ISO 10646 Standard. Therefore, the ALIAS_LINEs will correct to the following: 00C6 LATIN CAPITAL LETTER AE = latin capital ligature ae (1.1) 00E6 LATIN SMALL LETTER AE = latin small ligature ae (1.1) This particularly puzzling versioning might, perhaps, be explained in a COMMENT_LINE: * this character has been renamed again in version 2.0 Whether such a comment should be added, or not, is another question. IMO it may, to show Unicode has been willing to update names but is hindered to do so. But as such a comment is most likely to wake up old acrimony, it finally should rather not. By contrast, surely it wouldn't make much sense to add a second alias, like: 00C6 LATIN CAPITAL LETTER AE = latin capital ligature ae (1.1) = latin capital letter a e (1.0) 00E6 LATIN SMALL LETTER AE = latin small ligature ae (1.1) = latin small letter a e (1.0)
Date/Time: Wed Apr 22 11:24:56 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 NamesList syntax
NamesList syntax To complete my suggestion about adding a CODENAME_LINE, I suggest to choose the ampersand and to replace it by a lock. The complete syntax would then show as follows: CODENAME_LINE: TAB "&" SP NAME LF // Replace & by U+1F512, output line as code name The lock symbol is consistent with the fact that an identifier name, once published, must never be changed, and indicates clearly which name among the two is immutable. As a code character for the lock, I’d prefer the pound sign, but it isn’t a part of the US Standard layout, only of the US International keyboard layout and half of the latin Windows locale keyboard layouts. U+00A3 POUND SIGN would be a reverence to the locale that determined code names in the nineties. Similarly to the dollar sign as used in spreadsheet software, it would signify stability due to locking of properties (no matter how accurately they once were defined). Nevertheless it might be inappropriate to associate a currency symbol. By contrast, the ampersand is neutral and has a matching meaning. Another advantage is that actually it occurs only three times in the NamesList, of whom the third is an HTML code: 0022 QUOTATION MARK * preferred characters in English for paired quotation marks are 201C & 201D 0027 APOSTROPHE * preferred characters in English for paired quotation marks are 2018 & 2019 29DC INCOMPLETE INFINITY = ISOtech entity ⧜ There is no need for change, since the percent sign (marking up a FORMALALIAS_LINE) equally occurs in other contexts in the NamesList. About the principle of giving the true identifiers a place on the NAME_LINE, lowering the stable but false identifiers, there is to say that many implementations are so uncomplete there is scarcely any trace even of FormalAliases to find. Among users, there is a preference for true character names. Users who prefer stable identifiers, may consider to refer to the next line in those cases. This change brings that instead of a wrong Name and a true FormalAlias, there will be a true Name and a wrong CodeName. This will resolve all the problems brought up by uncomplete UCD parsers that are currently misused as UIs, or that are designed as UIs but do not implement the complete range of UCD datafiles. Best regards, Marcel Schneider
Date/Time: Wed Apr 22 11:26:44 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 Fraction slash
2044 FRACTION SLASH Additionally to a previous feedback, I would suggest adding the hint about how to compose arbitrary fractions in plain text, in another place as well. This could be the entry of the fraction slash U+2044 and, more precisely, the end of the existing COMMENT_LINE, after a comma: 2044 FRACTION SLASH = solidus (in typography) * for composing arbitrary fractions, in plain text with superscripts and subscripts. A demo file opening in a word processor, typesetted in Arial Unicode MS typeface, is available at bit.ly/1DNPtf0 To view it in PDF, there is another file at bit.ly/1JutBGK Best regards, Marcel Schneider
Date/Time: Fri Apr 24 11:46:48 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 Corrigendum
There is at least one error in my post on Tue Apr 21 10:16:42 CDT 2015. The UniData “1.0 Name” field number is 10, not 11. Sorry. Best regards, Marcel Schneider
Date/Time: Fri Apr 24 11:50:51 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 The Standard
To avoid globally wasting time and resources, the Unicode Standard must reach the maximum of usefulness. As a Universal Character Set, it is designed for use by end-users as well, not merely standardizers, implementers and developers. Or at least, it _should_ be, as I’m pointing out. This is why a small number of specialists who need names *stability*, stands in front of a huge number of users who prioritize names *accurateness*. But the latter don’t have means to rewrite or even adapt the Standard. As experience tends to show, this is true for most developers, too, who rely on the Standard and don’t aim at correcting it. No matter what Unicode names are defined as, they are taken as designations, as if they were scientific names. Scientists correct a name as soon as it proves to be inaccurate. And scientists manage very well several names per item while underscoring the most true one. This very useful system may give a paradigm of the way Unicode could deal with character names. ISO and Unicode having made a joint decision not to do so, it may be permitted to ask whether that decision could have been right or wrong. Take The Unicode Standard. In the text it uses current names, that are mostly identical with the Unicode 1.0 names, while the identifier of many characters differs from their current name. That may be called a useless and counterprouctive complication, which may appear as an impoliteness to readers who are not native speakers. The issue after having done yet a big part of standardization, is to transform a body-centered standard into a user-centered standard. A user-centered UCS is directly useful without needing a lot of precautions. By contrast, the actual concept is based upon the externalization of the care for accessibility, which is time- and money-wasting outside. Best regards, Marcel Schneider
Date/Time: Fri Apr 24 11:52:43 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 MICR U+2446 sq
Typo and Formal Aliases in MICR In the subheader notice line of the magnetic ink character recognition symbols, the first word of the second sentence is “The”, but as it is followed by a verb, it probably should be “They”. Apart this little typo, I’d expect the first two characters would be given a Formal Alias too, as have been the other two. That is a part of my concern about making the Standard more useful by growing the interest for the Formal Aliases and convince implementers and developers to write some additional parsing algorithms that would make all software “FormalAlias-aware”. With actually eighteen Formal Aliases only, developers don’t seem to be thinking seriously about the issue. Best regards, Marcel Schneider
Date/Time: Fri Apr 24 12:51:50 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 U+1F9C0, Translations
U+1F9C0 CHEESE WEDGE and Translations of the Code Charts Dear Unicode Consortium, I'm pleased to read the Feedback from Mr Lawson and would join my congratulations to his'. The Cheese Wedge symbol he underscores, recalls me the new sets have already been translated to French, and on-going efforts aim at providing a most accurate rendering of the Standard's actual whole content in French language. This highlights that my concern is not that translations should ever be avoided. I'm just anxious about all the other languages, among which some widely spoken ones admittedly don't have any translation of Unicode, everybody referring to the English original files exclusively, as I've read. More precisely about the Cheese Wedge, I'm glad to see unbloody, no-slaughter food is now strongly promoted and is given a fabulous opportunity of becoming a wide-spread cultural phenomenon. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:06:24 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Corrigenda
There are at least two mistakes in my post on Mon Apr 20 03:09:02 CDT 2015. In the second paragraph, “U+00AB LEFT DOUBLE ANGLE QUOTATION MARK” and “U+00BB RIGHT DOUBLE ANGLE QUOTATION MARK” should read “U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK” and “U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK”. I’m sorry about all these mistakes in my posts. These ISO names support correctly all *latin* scripts, by contrast with the abusively lateralized ISO names for single and double quotation marks U+2018 & U+2019 and U+201C & U+201D. However, this has no impact on my concern related to names supporting *all* scripts, which can be properly ensured by giving such bidi-mirrored characters a Formal Alias where LEFT is replaced with OPENING or BACKWARD, and RIGHT with CLOSING or FORWARD, or following other patterns. (It may be noted that the names’ complication induced by these modifications is incomparably less cumbersome than the one that was due to the replacement of GUILLEMET with DOUBLE ANGLE QUOTATION MARK.) For a *Universal* Character Set (as well as for an *International* Standards Organization), this care for a practicable universality is a real need. Best regards, Marcel Schneider ___________________________ P.S.: There is a typo in my post on Fri Apr 17 12:16:45 CDT 2015. In paragraph 3, “fare” should read “far”. ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Mon Apr 27 01:07:36 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+0670 ARABIC LETTER SUPERSCRIPT ALEF
U+0670 ARABIC LETTER SUPERSCRIPT ALEF needs a formal alias and a consistent subhead All sources I could look up confirm that U+0670 is a vowel sign (and a combining mark). Since it is a misnomer, it needs a Formal alias, which I guess would be approximately “ARABIC VOWEL SIGN SUPERSCRIPT ALEF”. This results from the other vowel sign instances inside the block and elsewhere. This character is listed after a subhead “Point”. In the Syriac block U+0700, a “Syriac points” subhead is found indeed (U+0730), which is completed with the parenthesized “(vowels)” explanation. By contrast, in the Arabic block U+0600, most of the vowel signs are listed under “Other combining marks”, while the Arabic Extended-A block U+08A0 contains several “Extended vowel signs” subheads. Would it therefore be possible to rewrite the entry @ Point 0670 ARABIC LETTER SUPERSCRIPT ALEF * actually a vowel sign, despite the name in a way like this: @ Point (vowel) 0670 ARABIC LETTER SUPERSCRIPT ALEF % ARABIC VOWEL SIGN SUPERSCRIPT ALEF * this diacritical mark is a vowel sign, character name is a misnomer Or the existing comment is kept, but a formal alias seems to be unavoidable. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:08:22 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+047C/U+047D CYRILLIC LETTER BEAUTIFUL OMEGA, UTN #27
In UTN #27, the two misnomers U+047C and U+047D are missing. They could be added in UTN #27, as well as U+0709, and given formal aliases too. The related NamesList entries might then transform from: 047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO = Cyrillic "beautiful omega" * despite its name, this character does not have a titlo, nor is it composed of an omega plus a diacritic x (cyrillic capital letter broad omega - A64C) 047D CYRILLIC SMALL LETTER OMEGA WITH TITLO to: 047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO % CYRILLIC CAPITAL LETTER BEAUTIFUL OMEGA 047D CYRILLIC SMALL LETTER OMEGA WITH TITLO % CYRILLIC SMALL LETTER BEAUTIFUL OMEGA * the Cyrillic beautiful omega does not have a titlo, nor is it composed of an omega plus a diacritic * character name is a misnomer x (cyrillic small letter broad omega - A64D) Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:09:11 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Hiragana and Katakana ligatures (U+309F, U+30FF)
As Andrew West states on “Unicode Character Names Part 1”, the U+309F HIRAGANA DIGRAPH YORI and the U+30FF KATAKANA DIGRAPH KOTO are ligatures, not digraphs. Would it therefore be appropriate to correct the two entries with formal aliases and a matching subhead? Probably they would look like: @ Hiragana ligature 309F HIRAGANA DIGRAPH YORI % HIRAGANA LIGATURE YORI * historically used in vertical contexts, but now found also in horizontal layout # <vertical> 3088 308A @ Katakana ligature 30FF KATAKANA DIGRAPH KOTO % KATAKANA LIGATURE KOTO * historically used in vertical contexts, but now found also in horizontal layout # >vertical< 30B3 30C8 Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:10:17 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Spacing diacritics
Spacing diacritics in Latin-1 block: U+005E, U+0060, 00B8; U+005F LOW LINE The two spacing accents U+005E CIRCUMFLEX ACCENT and U+0060 GRAVE ACCENT as well as U+00B8 CEDILLA should be given an ALIAS_LINE or, even better, a FORMALALIAS_LINE, containing their Unicode 1.0 name SPACING CIRCUMFLEX, SPACING GRAVE, SPACING CEDILLA respectively. This results from the usage as graphic characters (^, `) without any relation to their value as accents. This value is relevant only in their usage as deadkey characters. Precision and unambigousness, a main issue in a standard of a UCS, turn out to be missing here. This is true as well for U+005F LOW LINE. “Low line” is inconsistent with “overline”, it lacks the precision (spacing or combining) that is needed in a standard, and is globally less used than “underline” and far less than “underscore”. Three sample instances could therefore show as follows. The first series is the actual state, the second is with additional aliases: 005E CIRCUMFLEX ACCENT * this is a spacing character [...] 005F LOW LINE = spacing underscore (1.0) * this is a spacing character [...] 0060 GRAVE ACCENT * this is a spacing character [...] __________________________ 005E CIRCUMFLEX ACCENT = spacing circumflex (1.0) * this is a spacing character [...] 005F LOW LINE = spacing underscore (1.0) * this is a spacing character [...] 0060 GRAVE ACCENT = spacing grave (1.0) * this is a spacing character [...] However, actually the goal might have been to avoid showing too much how accurate the Unicode 1.0 names were. This may lead to suppress the versioning, while underscoring the importance of the aliases by raising them to FormalAlias state. This would allow to remove the COMMENT_LINEs, because they become redundant, like this: 005E CIRCUMFLEX ACCENT % SPACING CIRCUMFLEX [...] 005F LOW LINE % SPACING UNDERSCORE [...] 0060 GRAVE ACCENT % SPACING GRAVE [...] For a more advanced solution, please refer to my next post. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:11:09 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+1D13A MUSICAL SYMBOL MULTI REST
The U+1D13A character name being a misleading misnomer, it can be allowed a formal alias taken from the aliases already provided, as for example “MUSICAL SYMBOL DOUBLE WHOLE-REST” (with or without hyphen?). Perhaps a “* character name is a misnomer” comment may be added too. Although, “two” are already considered as “several”, therefore “multi” is not entirely wrong. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:11:42 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: The Greek letter LAMBDA
Following H. G. Liddell and R. Scott, A Greek English Lexicon, the spelling LAMBDA is incorrect, and the real spelling of the related Greek letter is LABDA. By contrast, there is no mention of a spelling LAMDA. Nevertheless, the ISO/IEC 10646 chief redactor, who was a compatriot of H. G. Liddell and R. Scott, forced the Unicode Consortium to abandon the current spelling LAMBDA it had adopted, and to replace all instances with the non-existent spelling LAMDA. I therefore suggest to create as many formal aliases as there are misspelled instances: 039B GREEK CAPITAL LETTER LAMDA % GREEK CAPITAL LETTER LAMBDA 03BB GREEK SMALL LETTER LAMDA % GREEK SMALL LETTER LAMBDA 1D27 GREEK LETTER SMALL CAPITAL LAMDA % GREEK LETTER SMALL CAPITAL LAMBDA 1038D UGARITIC LETTER LAMDA % UGARITIC LETTER LAMBDA 1D6B2 MATHEMATICAL BOLD CAPITAL LAMDA % MATHEMATICAL BOLD CAPITAL LAMBDA 1D6CC MATHEMATICAL BOLD SMALL LAMDA % MATHEMATICAL BOLD SMALL LAMBDA 1D6EC MATHEMATICAL ITALIC CAPITAL LAMDA % MATHEMATICAL ITALIC CAPITAL LAMBDA 1D706 MATHEMATICAL ITALIC SMALL LAMDA % MATHEMATICAL ITALIC SMALL LAMBDA 1D726 MATHEMATICAL BOLD ITALIC CAPITAL LAMDA % MATHEMATICAL BOLD ITALIC CAPITAL LAMBDA 1D740 MATHEMATICAL BOLD ITALIC SMALL LAMDA % MATHEMATICAL BOLD ITALIC SMALL LAMBDA 1D760 MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA % MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMBDA 1D77A MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA % MATHEMATICAL SANS-SERIF BOLD SMALL LAMBDA 1D79A MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA % MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMBDA 1D7B4 MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMDA % MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMBDA For personal use this is easily done in a spreadsheet. Nevertheless it is unefficient to do so if millions of developers and users must launch the process for themselves. Therefore, if Unicode could to the job once, that would be more economical. Regarding the ISO, it has no right to purposely lower the cultural content of the Standard. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 01:13:55 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 NamesList syntax, FormalAliases, UnicodeData
A Solution Combining Usefulness And Respectfulness Towards Stability Policy Dear Unicode Technical Committee, as we are tought by the principle which the ancient Romans forwarded to us: Pacta tenenda sunt. This is why actual Formal Aliases must never become Character Names [as opposed to my opinion posted on Mon Apr 20 03:09:02 CDT 2015, and on Wed Apr 22 11:24:56 CDT 2015], regardless of the severity of the misleading users are victims of, and how crestfallen implementers and developers ever might be in front of a Standard making them waste time and money by asking to be translated to English. This is why the draft of a solution Unicode has implemented since version 5.0, may be developed further. A Formal Alias is a kind of “second chance” for misnamed characters. Given that today’s computing resources allow to manage two names per item, there is no more need to keep Formal Aliases minimal. Furthermore, the NamesList syntax is not submitted to the Stability Policy, so Formal Aliases (including the "%" / U+203B "※" symbol) can be raised to the first line. The order “NAME_LINE, FORMALALIAS_LINE” is merely conventional and is not enforced programmatically. The challenge is to modify the syntax of the FORMALALIAS_LINE (adding CHAR), and to define an alternative syntax for the NAME_LINE (without CHAR). The related NamesList syntax would then show as follows (I’m quoting the NamesList.html page, changes being highlighted with double asterisks): CHAR_ENTRY**_N**: NAME_LINE | RESERVED_LINE | CHAR_ENTRY ALIAS_LINE **** | CHAR_ENTRY COMMENT_LINE [...] CHAR_ENTRY**_FA**: **FORMALALIAS_LINE** **NAME_LINE** | CHAR_ENTRY ALIAS_LINE **** | CHAR_ENTRY COMMENT_LINE [...] The explanations which follow this syntax notation, would then probably be completed with a sentence like: **A FORMALALIAS_LINE must be directly followed by a NAME_LINE.** Then in the paragraph “Directly following either a NAME_LINE or a RESERVED_LINE”, the “FORMALALIAS_LINE” may be removed. Fortunately, already today the NAME_LINE may occur elsewhere than in the first place, as it is stated: “The conventional order of elements in a char entry: NAME_LINE, FORMALALIAS_LINE, [...] is not enforced by the layout program”. The software issue would be that Unibook processes also the new forms of NAME_LINE and FORMALALIAS_LINE. The related NamesList File Elements might then show this way (the first is a quotation): NAME_LINE: CHAR TAB NAME LF // The CHAR and the corresponding image are echoed, // followed by the name as given in NAME **| TAB NAME LF // If the character has a formal alias** FORMALALIAS_LINE: **CHAR** TAB "%" SP NAME LF **// The CHAR and the corresponding image are echoed,** // followed by U+203B replacing %, // then output NAME as formal alias When these syntax changes are defined, this would be the first char entries having a formal alias today: 01A2 % LATIN CAPITAL LETTER GHA LATIN CAPITAL LETTER OI 01A3 % LATIN SMALL LETTER GHA LATIN SMALL LETTER OI * Pan-Turkic Latin alphabets This proves that the name and the formal alias remain unchanged across these enhancements. Thus, stability stays guaranteed. The great advantage of this array is, that software that does not care about formal aliases and nevertheless seems to be designed to inform end-users, will automatically show the true name if it parses the NamesList. The problem is with UnicodeData parsers, because they parse a file where no formal aliases are shown (which were added when the file format was already defined). Therefore IMO it would be useful to add some fields in UnicodeData, one of which will contain the Formal Aliases (see also my post on Mon Apr 13 10:05:31 CDT 2015). In fact there are scarcely any backwards-compatibility problems with new fields, because out-of-date software is expected to simply ignore them. The missing visibility of Formal Aliases seems to be an effect of their not being in UnicodeData. Clearly, parsing supplemental data files to gather a complete overview of character information may be considered as unefficient. Consistently, many developers would expect Unicode adding as many fields to UnicodeData as needed to get at reach all data that are to be processed. Software that comes to bug under the effect of additional fields in UnicodeData, is likely to need a thorough overhaul in any case. As for that, it is simplier to implement additional fields than additional files. For convenience, the list below is meant to illustrate how the NamesList and the Code Charts may look like, after applying the above and some previous suggestions. To shorten, FORMALALIAS_LINE and NAME_LINE are displayed only. Some ad-hoc annotation is added. — Unfortunately this list is unfinished — Best regards, Marcel Schneider __________________________ 0028 % OPENING PARENTHESIS LEFT PARENTHESIS 0029 % CLOSING PARENTHESIS RIGHT PARENTHESIS [posted on Fri Apr 17, 2015] 002E % PERIOD FULL STOP [“period” seems to be more universal, “full stop” being an alternative name. For example, The Advanced Learner’s Dictionary of Current English from the Oxford University Press shows as the sixth and last meaning of “period”: “the pause at the end of a sentence; the mark, also called _a full stop_ (.), expressing this. *put a period to*, put an end to.”] 002F % SLASH SOLIDUS [“solidus” is suspected to be an intentional, misleading misnomer] 0040 % AT SIGN COMMERCIAL AT [The precision “commercial” is useless because there is no other at sign (as opposed to the commercial minus sign U+2052), “sign” is missing, and this ISO name is inconsistent (the percent sign neither is not called “commercial percent [sign]”). As a general rule meant for ISO, UCS names must not follow personal preferences.] 005B % OPENING SQUARE BRACKET LEFT SQUARE BRACKET 005C % BACKSLASH REVERSE SOLIDUS 005D % CLOSING SQUARE BRACKET RIGHT SQUARE BRACKET 005E % SPACING CIRCUMFLEX CIRCUMFLEX ACCENT 005F % SPACING UNDERSCORE LOW LINE 0060 % SPACING GRAVE GRAVE ACCENT [please see today’s post] 007B % OPENING CURLY BRACKET LEFT CURLY BRACKET 007D % CLOSING CURLY BRACKET RIGHT CURLY BRACKET 00A1 % TURNED EXCLAMATION MARK INVERTED EXCLAMATION MARK 00AB % BACKWARD-POINTING DOUBLE ANGLE QUOTATION MARK LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 00B4 % SPACING ACUTE ACUTE ACCENT 00B8 % SPACING CEDILLA CEDILLA 00BB % FORWARD-POINTING DOUBLE ANGLE QUOTATION MARK RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 00BF % TURNED QUESTION MARK INVERTED QUESTION MARK 00DF % LATIN SMALL LETTER SZ LATIN SMALL LETTER SHARP S [posted on Mon Apr 20, 2015] 010C % LATIN CAPITAL LETTER C WITH HACEK LATIN CAPITAL LETTER C WITH CARON 010D % LATIN SMALL LETTER C WITH HACEK LATIN SMALL LETTER C WITH CARON [For these instances and the following, please see at 02C7.] 010E % LATIN CAPITAL LETTER D WITH HACEK LATIN CAPITAL LETTER D WITH CARON 010F % LATIN SMALL LETTER D WITH HACEK LATIN SMALL LETTER D WITH CARON 011A % LATIN CAPITAL LETTER E WITH HACEK LATIN CAPITAL LETTER E WITH CARON 011B % LATIN SMALL LETTER E WITH HACEK LATIN SMALL LETTER E WITH CARON 0132 % LATIN CAPITAL LETTER IJ LATIN CAPITAL LIGATURE IJ 0133 % LATIN SMALL LETTER IJ LATIN SMALL LIGATURE IJ 013D % LATIN CAPITAL LETTER L WITH HACEK LATIN CAPITAL LETTER L WITH CARON 013E % LATIN SMALL LETTER L WITH HACEK LATIN SMALL LETTER L WITH CARON 0147 % LATIN CAPITAL LETTER N WITH HACEK LATIN CAPITAL LETTER N WITH CARON 0148 % LATIN SMALL LETTER N WITH HACEK LATIN SMALL LETTER N WITH CARON 0152 % LATIN CAPITAL LETTER OE LATIN CAPITAL LIGATURE OE 0153 % LATIN SMALL LETTER OE LATIN SMALL LIGATURE OE 0158 % LATIN CAPITAL LETTER R WITH HACEK LATIN CAPITAL LETTER R WITH CARON 0159 % LATIN SMALL LETTER R WITH HACEK LATIN SMALL LETTER R WITH CARON 0160 % LATIN CAPITAL LETTER S WITH HACEK LATIN CAPITAL LETTER S WITH CARON 0161 % LATIN SMALL LETTER S WITH HACEK LATIN SMALL LETTER S WITH CARON 0164 % LATIN CAPITAL LETTER T WITH HACEK LATIN CAPITAL LETTER T WITH CARON 0165 % LATIN SMALL LETTER T WITH HACEK LATIN SMALL LETTER T WITH CARON 017D % LATIN CAPITAL LETTER Z WITH HACEK LATIN CAPITAL LETTER Z WITH CARON 017E % LATIN SMALL LETTER Z WITH HACEK LATIN SMALL LETTER Z WITH CARON 0190 % LATIN CAPITAL LETTER EPSILON LATIN CAPITAL LETTER OPEN E [deduced from UTN #27] 01A2 % LATIN CAPITAL LETTER GHA LATIN CAPITAL LETTER OI 01A3 % LATIN SMALL LETTER GHA LATIN SMALL LETTER OI [These are the first already existing formal aliases.] 01BE % LATIN STACKED LIGATURE TS [???] LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE [UTN #27] 01C4 % LATIN CAPITAL LETTER DZ WITH HACEK LATIN CAPITAL LETTER DZ WITH CARON 01C5 % LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH HACEK LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON 01C6 % LATIN SMALL LETTER DZ WITH HACEK LATIN SMALL LETTER DZ WITH CARON __________________________ 01CD-01D4: If in Sinology, “caron” is preferred, the Pinyin diacritic-vowel combinations must not be given formal aliases. Otherwise they should be. ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ 01E6 % LATIN CAPITAL LETTER G WITH HACEK LATIN CAPITAL LETTER G WITH CARON 01E7 % LATIN SMALL LETTER G WITH HACEK LATIN SMALL LETTER G WITH CARON 01E8 % LATIN CAPITAL LETTER K WITH HACEK LATIN CAPITAL LETTER K WITH CARON 01E9 % LATIN SMALL LETTER K WITH HACEK LATIN SMALL LETTER K WITH CARON 01EE % LATIN CAPITAL LETTER EZH WITH HACEK LATIN CAPITAL LETTER EZH WITH CARON 01EF % LATIN SMALL LETTER EZH WITH HACEK LATIN SMALL LETTER EZH WITH CARON 01F0 % LATIN SMALL LETTER J WITH HACEK LATIN SMALL LETTER J WITH CARON 021E % LATIN CAPITAL LETTER H WITH HACEK LATIN CAPITAL LETTER H WITH CARON 021F % LATIN SMALL LETTER H WITH HACEK LATIN SMALL LETTER H WITH CARON 0238 % LATIN SMALL LIGATURE DB LATIN SMALL LETTER DB DIGRAPH 0239 % LATIN SMALL LIGATURE QP LATIN SMALL LETTER QP DIGRAPH [UTN #27] 025B % LATIN SMALL LETTER EPSILON LATIN SMALL LETTER OPEN E 025E % LATIN SMALL LETTER CLOSED REVERSED EPSILON LATIN SMALL LETTER CLOSED REVERSED OPEN E [UTN #27] 0285 % LATIN SMALL LETTER REVERSED R WITH FISHHOOK AND RETROFLEX HOOK LATIN SMALL LETTER SQUAT REVERSED ESH [UTN #27] 02C7 % MODIFIER LETTER HACEK CARON [UTN #27, but the háček *has* been called so by Unicode. “Caron” is even inconsistent here since all these caracters are modifier letters and have their name beginning with. “Caron” is further respectless against native speakers of háček-using languages; *not* in the ‘United States Government Printing Office Style Manual’, because this may be for internal and national use, but in *ISO* standards which are international and must meet the involved member nations’ usages.] 030C % COMBINING HACEK COMBINING CARON 032C % COMBINING HACEK BELOW COMBINING CARON BELOW 034F COMBINING GRAPHEME JOINER: This character is listed among the “Known Anomalies”. However, instead of any (hard to find) alias, it could be given references to TUS, as “see §7.9 and §23.2”, to complete the existing COMMENT_LINEs (as already suggested generally in my feedback on Sat Apr 11, 2015). 039B % GREEK CAPITAL LETTER LAMBDA GREEK CAPITAL LETTER LAMDA 03BB % GREEK SMALL LETTER LAMBDA GREEK SMALL LETTER LAMDA [UTN #27; please see today’s post] 047C % CYRILLIC CAPITAL LETTER BEAUTIFUL OMEGA CYRILLIC CAPITAL LETTER OMEGA WITH TITLO 047D % CYRILLIC SMALL LETTER BEAUTIFUL OMEGA CYRILLIC SMALL LETTER OMEGA WITH TITLO [please refer to my post of today] 0598 % HEBREW ACCENT TSINORIT HEBREW ACCENT ZARQA 05AE % HEBREW ACCENT TSINOR HEBREW ACCENT ZINOR [UTN #27, and please see post on Mon Apr 20, 2015] 0670 % ARABIC VOWEL SIGN SUPERSCRIPT ALEF ARABIC LETTER SUPERSCRIPT ALEF [UTN #27; and another post today] 06C0 % ARABIC LIGATURE HEH WITH YEH ABOVE ARABIC LETTER HEH WITH YEH ABOVE 06C2 % ARABIC LIGATURE HEH GOAL WITH HAMZA ABOVE ARABIC LETTER HEH GOAL WITH HAMZA ABOVE 06D3 % ARABIC LIGATURE YEH BARREE WITH HAMZA ABOVE ARABIC LETTER YEH BARREE WITH HAMZA ABOVE [UTN #27] 0709 % SYRIAC SUBLINEAR COLON SKEWED LEFT SYRIAC SUBLINEAR COLON SKEWED RIGHT [This is an existing formal alias. It should be added in UTN #27.] 0A01 % GURMUKHI SIGN ADDAK BINDI GURMUKHI SIGN ADAK BINDI [UTN #27 shows character name is a misspelling] 0B83 % TAMIL SIGN AYTHAM TAMIL SIGN VISARGA [UTN #27: character name is a misnomer] 0CDE % KANNADA LETTER LLLA KANNADA LETTER FA 0E9D % LAO LETTER FO FON LAO LETTER FO TAM 0E9F % LAO LETTER FO FAY LAO LETTER FO SUNG 0EA3 % LAO LETTER RO LAO LETTER LO LING 0EA5 % LAO LETTER LO LAO LETTER LO LOOT [UTN #27] 0F0A % TIBETAN MARK ZOU YIK GUI GO TIBETAN MARK BKA- SHOG YIG MGO [UTN #27 and the French translation “ListeDesNoms-7.0(2014-06-22).txt” (courtesy http://hapax.qc.ca), which gives the alias “z'ou yik gui go”; the apostrophe has been deleted to conform to the English NamesList syntax rules] 0F0B % TIBETAN MARK TSHEG TIBETAN MARK INTERSYLLABIC TSHEG 0F0C TIBETAN MARK NO-BREAK TSHEG [???] TIBETAN MARK DELIMITER TSHEG BSTAR 0FD0 % TIBETAN MARK BKA- SHOG GI MGO RGYAN TIBETAN MARK BSKA- SHOG GI MGO RGYAN [UTN #27] 156F % CANADIAN SYLLABICS ASTERISK CANADIAN SYLLABICS TTH [UTN #27] 178E % KHMER LETTER NNA KHMER LETTER NNO 179E % KHMER LETTER SSA KHMER LETTER SSO [UTN #27] 1D27 % GREEK LETTER SMALL CAPITAL LAMBDA GREEK LETTER SMALL CAPITAL LAMDA 1E9E % LATIN CAPITAL LETTER SZ LATIN CAPITAL LETTER SHARP S [posted on Mon Apr 20, 2015] 2018 % SINGLE TURNED COMMA QUOTATION MARK LEFT SINGLE QUOTATION MARK 2019 % SINGLE COMMA QUOTATION MARK RIGHT SINGLE QUOTATION MARK 201A % LOW SINGLE COMMA QUOTATION MARK SINGLE LOW-9 QUOTATION MARK 201B % SINGLE REVERSED COMMA QUOTATION MARK SINGLE HIGH-REVERSED-9 QUOTATION MARK 201C % DOUBLE TURNED COMMA QUOTATION MARK LEFT DOUBLE QUOTATION MARK 201D % DOUBLE COMMA QUOTATION MARK RIGHT DOUBLE QUOTATION MARK 201E % LOW DOUBLE COMMA QUOTATION MARK DOUBLE LOW-9 QUOTATION MARK 201F % DOUBLE REVERSED COMMA QUOTATION MARK DOUBLE HIGH-REVERSED-9 QUOTATION MARK [The awkward and ethnocentric ISO names should be hidden. At least, the original Unicode names would better be raised at front.] 2039 % SINGLE BACKWARD-POINTING ANGLE QUOTATION MARK SINGLE LEFT-POINTING ANGLE QUOTATION MARK 203A % SINGLE FORWARD-POINTING ANGLE QUOTATION MARK SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 203E % SPACING OVERSCORE OVERLINE 2045 % OPENING SQUARE BRACKET WITH QUILL LEFT SQUARE BRACKET WITH QUILL 2046 % CLOSING SQUARE BRACKET WITH QUILL RIGHT SQUARE BRACKET WITH QUILL 207D % SUPERSCRIPT OPENING PARENTHESIS SUPERSCRIPT LEFT PARENTHESIS 207E % SUPERSCRIPT CLOSING PARENTHESIS SUPERSCRIPT RIGHT PARENTHESIS 208D % SUBSCRIPT OPENING PARENTHESIS SUBSCRIPT LEFT PARENTHESIS 208E SUBSCRIPT CLOSING PARENTHESIS SUBSCRIPT RIGHT PARENTHESIS [...] 20E5 % COMBINING BACKSLASH OVERLAY COMBINING REVERSE SOLIDUS OVERLAY 2118 % WEIERSTRASS ELLIPTIC FUNCTION SCRIPT CAPITAL P 2446 % MICR TRANSIT SYMBOL OCR BRANCH BANK IDENTIFICATION 2447 % MICR AMOUNT SYMBOL OCR AMOUNT OF CHECK [posted on Fri Apr 24, 2015] 2448 % MICR ON US SYMBOL OCR DASH 2449 % MICR DASH SYMBOL OCR CUSTOMER ACCOUNT NUMBER 3021 % SUZHOU NUMERAL ONE HANGZHOU NUMERAL ONE 3022 % SUZHOU NUMERAL TWO HANGZHOU NUMERAL TWO 3023 % SUZHOU NUMERAL THREE HANGZHOU NUMERAL THREE 3024 % SUZHOU NUMERAL FOUR HANGZHOU NUMERAL FOUR 3025 % SUZHOU NUMERAL FIVE HANGZHOU NUMERAL FIVE 3026 % SUZHOU NUMERAL SIX HANGZHOU NUMERAL SIX 3027 % SUZHOU NUMERAL SEVEN HANGZHOU NUMERAL SEVEN 3028 % SUZHOU NUMERAL EIGHT HANGZHOU NUMERAL EIGHT 3029 % SUZHOU NUMERAL NINE HANGZHOU NUMERAL NINE 309F % HIRAGANA LIGATURE YORI HIRAGANA DIGRAPH YORI 30FF % KATAKANA LIGATURE KOTO KATAKANA DIGRAPH KOTO [courtesy Andrew West; please see post of today above] A015 % YI SYLLABLE ITERATION MARK YI SYLLABLE WU FE18 % PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET FE6B % SMALL AT SIGN SMALL COMMERCIAL AT FEFF % BYTE ORDER MARK ZERO WIDTH NO-BREAK SPACE FF20 % FULLWIDTH AT SIGN FULLWIDTH COMMERCIAL AT 1038D % UGARITIC LETTER LAMBDA UGARITIC LETTER LAMDA 122D4 % CUNEIFORM SIGN NU11 TENU CUNEIFORM SIGN SHIR TENU 122D5 % CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR 1D0C5 % BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS 1D13A % MUSICAL SYMBOL DOUBLE WHOLE-REST MUSICAL SYMBOL MULTI REST [please refer to my related post] 1D6B2 % MATHEMATICAL BOLD CAPITAL LAMBDA MATHEMATICAL BOLD CAPITAL LAMDA 1D6CC % MATHEMATICAL BOLD SMALL LAMBDA MATHEMATICAL BOLD SMALL LAMDA 1D6EC % MATHEMATICAL ITALIC CAPITAL LAMBDA MATHEMATICAL ITALIC CAPITAL LAMDA 1D706 % MATHEMATICAL ITALIC SMALL LAMBDA MATHEMATICAL ITALIC SMALL LAMDA 1D726 % MATHEMATICAL BOLD ITALIC CAPITAL LAMBDA MATHEMATICAL BOLD ITALIC CAPITAL LAMDA 1D740 % MATHEMATICAL BOLD ITALIC SMALL LAMBDA MATHEMATICAL BOLD ITALIC SMALL LAMDA 1D760 % MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMBDA MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA 1D77A % MATHEMATICAL SANS-SERIF BOLD SMALL LAMBDA MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA 1D79A % MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMBDA MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA 1D7B4 % MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMBDA MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMDA ______________________________________________________________
Date/Time: Mon Apr 27 09:30:46 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: 1F6D0 PLACE OF WORSHIP
This character is not bidi-mirroring, like all symbols in the block, but this should IMO make an exception because in right-to-left script, the worshipping person is at back. Perhaps right-to-left scripts must use special fonts where all relevant symbols are mirrored. This is inconsistent with bidi-mirroring of mathematical symbols. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 09:32:25 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+19B0 NEW TAI LUE VOWEL SIGN VOWEL SHORTENER
The first of the New Tai Lue vowel signs that are in beta review, U+19B0, is a vowel shortener. Unlike U+0EB1 LAO VOWEL SIGN MAI KAN, which a COMMENT_LINE points out to be a vowel shortener, U+19B0 NEW TAI LUE VOWEL SIGN VOWEL SHORTENER seems to have no proper name in New Tai Lue. Therefore, I would suggest to shorten its name to NEW TAI LUE VOWEL SHORTENER. (There are no other instances in Unicode where a ‘vowel shortener’ occurs, than these two.) Actually: 19B0 NEW TAI LUE VOWEL SIGN VOWEL SHORTENER Suggested: 19B0 NEW TAI LUE VOWEL SHORTENER Best regards, Marcel Schneider
Date/Time: Mon Apr 27 09:34:24 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+11350 GRANTHA OM
Unlike all the other Grantha characters, and despite its dedicated “Sign” subhead, the name of U+11350 GRANTHA OM does not contain a class precision. Therefore I suggest to add some, after which the part would look this way: [...] 1134C GRANTHA VOWEL SIGN AU : 11347 11357 @ Virama 1134D GRANTHA SIGN VIRAMA @ Sign 11350 GRANTHA SIGN OM @ Dependent vowel sign 11357 GRANTHA AU LENGTH MARK @ Sign 1135D GRANTHA SIGN PLUTA [...] Best regards, Marcel Schneider
Date/Time: Mon Apr 27 09:39:02 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+218B TURNED DIGIT THREE
Cross-reference not fully “accurate” Dear Unicode Technical Committee, would it be possible that the users of the Standard will be given xrefs showing the true character names, not merely internal identifiers? For example, at the new turned digit three U+218B, readers find “→ 0190 Ɛ latin capital letter open e”. But as I already posted, this so-called open e should be given a formal alias showing that its true name is latin epsilon. However, all these formal aliases, whether they are eighteen or one hundred and eighty, are of little use if these characters stay being called by their wrong name everywhere else. To make it clear: Unicode won’t have a single wrong name in the Standard if in the early nineties, there were no people who made trouble and wrote out acrimonious words when the Unicode Consortium had removed the wrong name for Æ and æ to reset it to its (original) true value. You know the issue. I newly got hope the ISO is not any more this tyrannic standards body it was apparently in the nineties, and that it would not longer insist upon that names stability which never created but trouble and acrimony among users, typographers, all the working force that is in contact with the documents published by Unicode, taking aim at using the characters. Today, the Unicode caracter set is indispensable, and therefore Unicode can improve its usefulness at use (as opposed to the usefulness at standardization), whether by reengineering the names stability, or by hollowing it out with numerous prioritized formal aliases and smart cross-references which give the true name, eventually preceded by a percent sign, or not. Given that the NamesList syntax is out of reach of the Stability Policy, there are many possibilities. People hardly understand that the Unicode Standard is maintained without sweeping out all the wrong names once for ever. Best regards, Marcel Schneider
Date/Time: Mon Apr 27 09:45:20 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Language conformance
The Charts must be written in British English, a fact that is reminded by the five (beta-review) Sutton SignWriting characters with CENTRE (U+1D862, U+1D863, U+1D864, U+1DA5F, U+1DA60), growing the total number to 30. Consequently, perhaps U+1F22D SQUARED CJK UNIFIED IDEOGRAPH-4E2D should have its alias “center field” converted to “centre field”, and the comments at U+205B “this is centered on the line, but extends beyond top and bottom of the line” and U+A8FA “zero-advance character centered on the point between two orthographic syllables” ought to be corrected too? (The aliases for U+2385, U+1F17B, U+1F17C are IMO purposely with “center”.) As for a personal feedback I don’t know whether to complain or not. Practically, the American spelling has some advantages as it is more widespread, for example in style sheets, but also in current language, while “centre” may be regarded as French or as a historic spelling. At least one fact is certain: “CENTRE” was not Unicode’s choice. Best regards, Marcel Schneider
Date/Time: Tue Apr 28 07:28:15 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Some comments on my feedback for PRI #297
Dear Unicode Technical Commettee, I’m sorry to send you some more beta-related feedback *after* the deadline (the day after the deadline). It’s just a matter for me of not using the deadline as a pretext for not doing my job, even if finally the belated posts cannot be considered. About this chronological order, Unicode could expect the beta review feedback would be prioritized, whereas I posted first my general concerns. This is because as a user I got convinced that the usefulness of the UCS must become centered on usage, while till now it is, in some essential settings, centered on standardization. This is partly in the nature of a Standard. This part is now lesser because the UCS is well launched and there is no serious alternative any more. But partly it is a remainder of some personal external views in the beginnings. This part should not be taken into consideration because a UCS must meet a worldwide demand for true and reasonably widespread average identifiers (which in practice are used [not as descriptors but] as serious designations), and because it must be directly useful to end-users (who mostly read English), without relying only on the goodwill of overburdened implementers and developers when the challenge is to effectively correct misnomers and other useless and counterproductive complications and impolitenesses. Now, as other Standards bodies outsourced the UCS and all management is centralized on the Unicode Consortium, the UCD data files and the UCS Code Charts can be boosted to reach at maximum reliability and ease of implementation. Best wishes, Marcel Schneider
Date/Time: Tue Apr 28 07:29:58 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Siddham section marks
The Siddham section marks U+115CA and U+115CB showing four tridents, a plural S might be added to the word TRIDENT in their names (for example SIDDHAM SECTION MARK WITH TRIDENT*S* AND DOTTED CRESCENTS), because the other constituents as RAYS and CIRCLES are plural too. Another question is whether to add DOTTED in the character names. The dot terminating a crescent stack seems not to be distinctive, and its presence in the names is unconsistent (compare U+115CB - U+005CE with DOTTED, vs U+115D1 - U+115D4 without DOTTED). If shorter names are preferred, DOTTED may be removed from U+115CB - U+005CE; if fully descriptive names are aimed at, it can be added to U+115D1 - U+115D4 SIDDHAM SECTION MARK WITH *DOTTED* SEPTUPLE CRESCENTS. In fact, as a dotted crescent looks like a Candrabindu, (DOTTED) CRESCENT(S) in these character names might become CANDRABINDU(S), if a huge majority of users would love this naming. The “U-shaped ornament” for its part, with the centerline inside, has a likeness with a trident without stick, a kind of echoed trident or just the trident’s prongs. This is striking because the tridents of U+115CA SIDDHAM SECTION MARK WITH TRIDENT AND U-SHAPED ORNAMENTS have U-shaped prongs, while those of U+115CB SIDDHAM SECTION MARK WITH TRIDENT AND DOTTED CRESCENTS have crescent-shaped ones. As a result, these considerations might end up in some name change proposals (because the discussed characters are still in draft) as shown in the list below (not showing five section marks whose names I won’t suggest to modify), where the name after the code point is the original one, followed by one or several alternate names designed to meet different possible preferences. Best regards, Marcel Schneider _____________________________________________________________________ 115CA SIDDHAM SECTION MARK WITH TRIDENT AND U-SHAPED ORNAMENTS SIDDHAM SECTION MARK WITH TRIDENTS AND U-SHAPED ORNAMENTS SIDDHAM SECTION MARK WITH TRIDENTS AND TRIDENT PRONGS SIDDHAM SECTION MARK WITH PRONGED TRIDENTS SIDDHAM SECTION MARK WITH PRONGED AND ECHOED TRIDENTS SIDDHAM SECTION MARK WITH PRONGED RAYS AND PRONGS 115CB SIDDHAM SECTION MARK WITH TRIDENT AND DOTTED CRESCENTS SIDDHAM SECTION MARK WITH TRIDENTS AND DOTTED CRESCENTS SIDDHAM SECTION MARK WITH TRIDENTS AND CANDRABINDUS SIDDHAM SECTION MARK WITH CRESCENTED TRIDENTS SIDDHAM SECTION MARK WITH CRESCENTED AND ECHOED TRIDENTS SIDDHAM SECTION MARK WITH CRESCENTED RAYS AND DOTTED CRESCENTS 115CC SIDDHAM SECTION MARK WITH RAYS AND DOTTED CRESCENTS SIDDHAM SECTION MARK WITH RAYS AND CANDRABINDUS SIDDHAM SECTION MARK WITH RAYS AND CRESCENTS 115CD SIDDHAM SECTION MARK WITH RAYS AND DOTTED DOUBLE CRESCENTS SIDDHAM SECTION MARK WITH RAYS AND DOUBLE CANDRABINDUS SIDDHAM SECTION MARK WITH RAYS AND DOUBLE CRESCENTS 115CE SIDDHAM SECTION MARK WITH RAYS AND DOTTED TRIPLE CRESCENTS SIDDHAM SECTION MARK WITH RAYS AND TRIPLE CANDRABINDUS SIDDHAM SECTION MARK WITH RAYS AND TRIPLE CRESCENTS [...] 115D1 SIDDHAM SECTION MARK WITH DOUBLE CRESCENTS SIDDHAM SECTION MARK WITH DOUBLE CANDRABINDUS SIDDHAM SECTION MARK WITH DOTTED DOUBLE CRESCENTS 115D2 SIDDHAM SECTION MARK WITH TRIPLE CRESCENTS SIDDHAM SECTION MARK WITH TRIPLE CANDRABINDUS SIDDHAM SECTION MARK WITH DOTTED TRIPLE CRESCENTS 115D3 SIDDHAM SECTION MARK WITH QUADRUPLE CRESCENTS SIDDHAM SECTION MARK WITH QUADRUPLE CANDRABINDUS SIDDHAM SECTION MARK WITH DOTTED QUADRUPLE CRESCENTS 115D4 SIDDHAM SECTION MARK WITH SEPTUPLE CRESCENTS SIDDHAM SECTION MARK WITH SEPTUPLE CANDRABINDUS SIDDHAM SECTION MARK WITH DOTTED SEPTUPLE CRESCENTS [...] ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Tue Apr 28 07:31:00 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Code Charts: reserved code points
I’d remove the default glyph icon for <reserved> code points in the Code Chart lists because it may disturb the layout, overlapping some characters like U+115B5 SIDDHAM VOWEL SIGN VOCALIC RR. The hatched rounded square is nice but not indispensable for understanding. The syntax specifies “an icon for the reserved character” must be displayed, but if this ends up in hiding some parts of glyphs, the place might be left void. As a detail, I would suggest to choose an upwards hatch rather than downwards in the Code Charts, despite of downwards (upper left - lower right) being heraldic usage. Best regards, Marcel Schneider
Date/Time: Tue Apr 28 07:31:56 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Latin Extended-D and Latin Extended-E
In the COMMENT_LINE of U+A78F LATIN LETTER SINOLOGICAL DOT, the preposition ‘for’ occurs three times with two values. It may therefore be replaced with another preposition, as ‘in’ or ‘of’. Inspiring from the COMMENT_LINE of the preceding character (“used to transcribe Toda”), it would be also possible to verbalize the nouns, but the adjective would then convert to an adverb. Among the resulting options: 1 * used _in_ transliteration for Phags-Pa and phonetic transcription for Tangut 2 * used for transliteration _of_ Phags-Pa and phonetic transcription _of_ Tangut 3 * used _in_ transliteration _of_ Phags-Pa and phonetic transcription _of_ Tangut 4 * used _to transliterate_ Phags-Pa and _in_ phonetical transcription for Tangut 5 * used _to transliterate_ Phags-Pa and _for_ phonetical transcription _of_ Tangut 6 * used _to transliterate_ Phags-Pa and _in_ phonetical transcription _of_ Tangut 7 * used _to transliterate_ Phags-Pa and _phonetically transcribe_ Tangut However, in the NamesList, another instance with matching context shows that the preferred form is with ‘in’ - ‘of’ (option 3 above): 0255 LATIN SMALL LETTER C WITH CURL [...] * used in transcription of Mandarin Chinese In the COMMENT_LINE of U+A7B3 LATIN CAPITAL LETTER CHI, the space between “lower” and “case” should be deleted to conform to the usage in the NamesList / Code Charts. In the NOTICE_LINE of the Historic letters for Sakha (Yakut) subhead (U+AB60), the final “that era” could IMO be replaced with “Sakha (Yakut)”, because “the [...] orthography of that era”, while literally correct, creates a redundance with the immediately preceding “from 1917 to 1927” and could therefore be used to repeat the language name rather than the fact that these letters are out of date, on condition that the result would be correct: @+ These letters were used from 1917 to 1927 in the official IPA-based Latin orthography of _Sakha (Yakut)_. By this occasion I wish to congratulate Unicode for having kept naming letters as U+AB62 LATIN SMALL LETTER OPEN OE and U+AB63 LATIN SMALL LETTER UO, “letters” and not “ligatures” as would have done an early prescriptor (U+0153 LATIN SMALL LIGATURE OE; Unicode 1.0 name: LATIN SMALL LETTER O E). Best regards, Marcel Schneider
Date/Time: Tue Apr 28 07:32:39 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: A cross-reference for new Combining half marks
As done for 12 other combining half-marks on a former total of 14, that is for all where possible, an xref should be added after U+FE2F COMBINING CYRILLIC TITLO RIGHT HALF. The complete subheader will then look like this: @ Combining half marks @+ These are used for supralineation in Church Slavonic texts. FE2E COMBINING CYRILLIC TITLO LEFT HALF FE2F COMBINING CYRILLIC TITLO RIGHT HALF x (combining cyrillic titlo - 0483) Best regards, Marcel Schneider
Date/Time: Tue Apr 28 12:31:42 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Bidi-mirroring of symbols and pictographs
There are many symbols and pictographs which look better when mirrored, in right-to-left scripts, because they show living creatures or vehicles from the side. In this cases the front is generally directed to the opposite of the writing direction, in order that the reader encounters it at front. In left-to-right scripts, they are “looking” from right to left. Their looking direction is related to semantics. If they express a movement, for example U+1F32C WIND BLOWING FACE, they are “looking” in the writing direction (for example, in left-to-right scripts, from left to right). All these symbols need to be bidi-mirrored as well as other symbols and characters, as I began to underscore in a post on Mon Apr 27, 2015 (yesterday). For a complete rendering engine, it will be enough to be flagged for bidi-mirroring, to automatically mirror the glyph provided by the font, and such a rendering engine can therefore display every symbol mirrored, even if the font does not provide mirrored glyphs. This makes sense for all symbol glyphs that do not contain (actually latin) text and whose semantics does not cover directionality (as pointing hands or arrows). For an example, I’ve extracted a list (below) of some symbols from UnicodeData, belonging to the blocks U+1F300-1F5FF Miscellaneous Symbols and Pictographs, and U+1F680-1F6FF Transport and Map Symbols, and needing IMHO to be bidi-mirrored (in a bidirectional context). Best regards, Marcel Schneider ______________________________________________________ CHAR BIDIM NAME 1F320 Y SHOOTING STAR 1F324 Y WHITE SUN WITH SMALL CLOUD 1F325 Y WHITE SUN BEHIND CLOUD 1F326 Y WHITE SUN BEHIND CLOUD WITH RAIN 1F327 Y CLOUD WITH RAIN 1F328 Y CLOUD WITH SNOW 1F329 Y CLOUD WITH LIGHTNING 1F32A Y CLOUD WITH TORNADO 1F32C Y WIND BLOWING FACE 1F339 Y ROSE 1F33A Y HIBISCUS 1F33E Y EAR OF RICE 1F3A0 Y CAROUSEL HORSE 1F3B3 Y BOWLING 1F3C0 Y BASKETBALL AND HOOP 1F3C1 Y CHEQUERED FLAG 1F3C2 Y SNOWBOARDER 1F3C3 Y RUNNER 1F3C4 Y SURFER 1F3C7 Y HORSE RACING 1F3CA Y SWIMMER 1F3CD Y RACING MOTORCYCLE 1F3CE Y RACING CAR 1F3DC Y DESERT 1F3DD Y DESERT ISLAND 1F3DE Y NATIONAL PARK 1F3F1 Y WHITE PENNANT 1F3F2 Y BLACK PENNANT 1F3F3 Y WAVING WHITE FLAG 1F3F4 Y WAVING BLACK FLAG 1F400 Y RAT 1F401 Y MOUSE 1F402 Y OX 1F403 Y WATER BUFFALO 1F404 Y COW 1F405 Y TIGER 1F406 Y LEOPARD 1F407 Y RABBIT 1F408 Y CAT 1F409 Y DRAGON 1F40A Y CROCODILE 1F40B Y WHALE 1F40C Y SNAIL 1F40D Y SNAKE 1F40E Y HORSE 1F40F Y RAM 1F410 Y GOAT 1F411 Y SHEEP 1F412 Y MONKEY 1F413 Y ROOSTER 1F414 Y CHICKEN 1F415 Y DOG 1F416 Y PIG 1F417 Y BOAR 1F418 Y ELEPHANT 1F41A Y SPIRAL SHELL 1F41B Y BUG 1F41C Y ANT 1F41D Y HONEYBEE 1F41E Y LADY BEETLE 1F41F Y FISH 1F420 Y TROPICAL FISH 1F421 Y BLOWFISH 1F422 Y TURTLE 1F424 Y BABY CHICK 1F426 Y BIRD 1F427 Y PENGUIN 1F428 Y KOALA 1F429 Y POODLE 1F42A Y DROMEDARY CAMEL 1F42B Y BACTRIAN CAMEL 1F42C Y DOLPHIN 1F432 Y DRAGON FACE 1F433 Y SPOUTING WHALE 1F434 Y HORSE FACE 1F43F Y CHIPMUNK 1F481 Y INFORMATION DESK PERSON 1F483 Y DANCER 1F4BA Y SEAT 1F4EA Y CLOSED MAILBOX WITH LOWERED FLAG 1F4EB Y CLOSED MAILBOX WITH RAISED FLAG 1F4EC Y OPEN MAILBOX WITH RAISED FLAG 1F4ED Y OPEN MAILBOX WITH LOWERED FLAG 1F4EE Y POSTBOX 1F4EF Y POSTAL HORN 1F4F0 Y NEWSPAPER 1F4F2 Y MOBILE PHONE WITH RIGHTWARDS ARROW AT LEFT 1F52C Y MICROSCOPE 1F52D Y TELESCOPE 1F54A Y DOVE OF PEACE 1F54F Y BOWL OF HYGIEIA 1F680 Y ROCKET 1F681 Y HELICOPTER 1F682 Y STEAM LOCOMOTIVE 1F683 Y RAILWAY CAR 1F684 Y HIGH-SPEED TRAIN 1F685 Y HIGH-SPEED TRAIN WITH BULLET NOSE 1F68C Y BUS 1F68E Y TROLLEYBUS 1F690 Y MINIBUS 1F691 Y AMBULANCE 1F692 Y FIRE ENGINE 1F693 Y POLICE CAR 1F695 Y TAXI 1F697 Y AUTOMOBILE 1F699 Y RECREATIONAL VEHICLE 1F69A Y DELIVERY TRUCK 1F69B Y ARTICULATED LORRY 1F69C Y TRACTOR 1F69E Y MOUNTAIN RAILWAY 1F6A0 Y MOUNTAIN CABLEWAY 1F6A1 Y AERIAL TRAMWAY 1F6A3 Y ROWBOAT 1F6A4 Y SPEEDBOAT 1F6A9 Y TRIANGULAR FLAG ON POST 1F6AE Y PUT LITTER IN ITS PLACE SYMBOL 1F6AF Y DO NOT LITTER SYMBOL 1F6B2 Y BICYCLE 1F6B3 Y NO BICYCLES 1F6B4 Y BICYCLIST 1F6B5 Y MOUNTAIN BICYCLIST 1F6B6 Y PEDESTRIAN 1F6B7 Y NO PEDESTRIANS 1F6B8 Y CHILDREN CROSSING 1F6C2 Y PASSPORT CONTROL 1F6C3 Y CUSTOMS 1F6D0 Y PLACE OF WORSHIP 1F6E5 Y MOTOR BOAT 1F6E9 Y SMALL AIRPLANE 1F6EA Y NORTHEAST-POINTING AIRPLANE 1F6EB Y AIRPLANE DEPARTURE 1F6EC Y AIRPLANE ARRIVING 1F6F0 Y SATELLITE 1F6F2 Y DIESEL LOCOMOTIVE 1F6F3 Y PASSENGER SHIP ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Thu Apr 30 06:44:57 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297
Dear Unicode Technical Committee, in addition to my beta feedback, and to complete, I’ve got some general concerns again. About harmonizing the orthograph of “lowercase”, there is to say that while no instances of “upper case” (with a space) are found, there are beside of 77 instances of “lowercase”, six of “lower case”, among which three of the form “lower case is CHAR”. One having already been reported, U+2C7E LATIN CAPITAL LETTER S WITH SWASH TAIL and U+2C7F LATIN CAPITAL LETTER Z WITH SWASH TAIL remain to be corrected. Perhaps the other (U+2121 TELEPHONE SIGN, U+213B FACSIMILE SIGN, and U+1F670 SCRIPT LIGATURE ET ORNAMENT) should be too, for consistency. A formal alias is IMHO missing for U+0F0A TIBETAN MARK BKA- SHOG YIG MGO, given that there is only the English translation of the right name, not the name itself in transcribed Tibetan, as it is given in the French translation “ListeDesNoms-7.0(2014-06-22).txt” (courtesy http://hapax.qc.ca), which shows as an alias “z'ou yik gui go”; if deleting the apostrophe for conformance to the English NamesList syntax rules, this would end up as % TIBETAN MARK ZOU YIK GUI GO A mathematical symbol, U+29A1 SPHERICAL ANGLE OPENING UP, has the ‘bidi-mirrored’ property but is symetrical (to a vertical axe). When mathematical symbols are symetrical (as U+29D3 BLACK BOWTIE), they ordinarily are *not* bidi-mirrored. Therefore I suppose that this property should be set to “No” for U+29A1 too. A next step could be to update UTN #27, where several misnomers are still missing. Even if this Technical Note does not aim at giving a complete overview of *all* “Known Anomalies” (as results from interpreting words such as “provides information on many known anomalies” and “compiled information on many misnamed characters”), updating it would make it even more helpful. The missing anomalies to mention are AFAK: U+047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO, U+047D CYRILLIC SMALL LETTER OMEGA WITH TITLO: The comment in the Code Chart “despite its name, this character does not have a titlo, nor is it composed of an omega plus a diacritic” qualifies these two characters for mention in UTN #27. U+027F LATIN SMALL LETTER REVERSED R WITH FISHHOOK: There is just an alias followed by the parenthesized mention “(a misnomer)”. It therefore shoud be added on UTN #27 (and given a formal alias, see below). U+0709 SYRIAC SUBLINEAR COLON SKEWED RIGHT: This misnomer has an existing formal alias, but neither is there a comment in the Code Chart stating “* (character) name is a misnomer”, as this occurs in a few character entries—with or without formal alias—, nor is it present in UTN #27. The MICR (U+2446 sqq) have a mention at the end of the subheader notice: “The Unicode character names include several misnomers.” and should therefore be mentioned in the Technical Note (they were encoded 12 years ago). U+309F HIRAGANA DIGRAPH YORI, U+30FF KATAKANA DIGRAPH KOTO: Andrew West states on “Unicode Character Names Part 1” that “These characters are ligatures of "db" and "qp" respectively, and not digraphs.” U+122D4 CUNEIFORM SIGN SHIR TENU, U+122D5 CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR: The formal aliases assigned to these two characters didn’t exist at the time of UTN #27, nor were the characters yet encoded. That isn’t true for U+1D13A MUSICAL SYMBOL MULTI REST, but probably the weak issue dissuaded from admitting it to UTN #27. There are several things to note about the fraction characters, beginning with U+00BC VULGAR FRACTION ONE QUARTER, U+00BD VULGAR FRACTION ONE HALF, U+00BE VULGAR FRACTION THREE QUARTERS. First, the comment at U+00BC “other fraction characters: 2153-215E” must have its starting point changed to “2150” (it just hasn’t been updated when U+2050, U+2051 and U+2052 were encoded). Second, the recurrent comment “bar may be horizontal or slanted”, present in the Latin-1 block, should be echoed in some way for the other fraction characters. I suggest to add it as a first NOTICE_LINE in the “Fractions” subheader at U+2050: @ Fractions @+ Bar may be horizontal or slanted This leads over to my third concern regarding fractions. AFAK the epithet “vulgar” applies to fractions with a slanted bar, as is the fraction slash (U+2044), whereas fractions with a horizontal bar are *not* vulgar ones. At least there is no reason to call them so, since this is current usage in mathematics. That “vulgar” epithet is, well, another mischief coming from that famous merger with the ISO/IEC 10646 draft. Since these character names are forwarded to end-users without being corrected, they must therefore be given formal aliases eliminating the wrong, misleading, useless and value-lowering precision: 00BC VULGAR FRACTION ONE QUARTER % FRACTION ONE QUARTER 00BD VULGAR FRACTION ONE HALF % FRACTION ONE HALF 00BE VULGAR FRACTION THREE QUARTERS % FRACTION THREE QUARTERS 2150 VULGAR FRACTION ONE SEVENTH % FRACTION ONE SEVENTH 2151 VULGAR FRACTION ONE NINTH % FRACTION ONE NINTH 2152 VULGAR FRACTION ONE TENTH % FRACTION ONE TENTH 2153 VULGAR FRACTION ONE THIRD % FRACTION ONE THIRD 2154 VULGAR FRACTION TWO THIRDS % FRACTION TWO THIRDS 2155 VULGAR FRACTION ONE FIFTH % FRACTION ONE FIFTH 2156 VULGAR FRACTION TWO FIFTHS % FRACTION TWO FIFTHS 2157 VULGAR FRACTION THREE FIFTHS % FRACTION THREE FIFTHS 2158 VULGAR FRACTION FOUR FIFTHS % FRACTION FOUR FIFTHS 2159 VULGAR FRACTION ONE SIXTH % FRACTION ONE SIXTH 215A VULGAR FRACTION FIVE SIXTHS % FRACTION FIVE SIXTHS 215B VULGAR FRACTION ONE EIGHTH % FRACTION ONE EIGHTH 215C VULGAR FRACTION THREE EIGHTHS % FRACTION THREE EIGHTHS 215D VULGAR FRACTION FIVE EIGHTHS % FRACTION FIVE EIGHTHS 215E VULGAR FRACTION SEVEN EIGHTHS % FRACTION SEVEN EIGHTHS At last, since my proposal list for supplemental formal aliases was uncomplete and presented other inconveniences, it would be permitted to attach another, more complete one below, which above all conforms to the actual syntax (NAME first). This and other changes were often done in an automatized (spreadsheet) way. A leading principle was that whatever characters are bidi-mirroring, LEFT and RIGHT qualifiers *must* be avoided in their names, because they grow wrong when bidi-mirroring is effective, that is, in right-to-left scripts. Using LEFT or RIGHT in those names despite of their being mismatching in a part of the contexts, is missing respectfulness towards a part of the users. A UCS’s identifiers must not be unfitting for right-to-left scripts. It is to be underscored that Unicode aimed at making the character names universal, and it was under the influence of ISO that names grew wrong and worse. I’ve good reasons to believe that today, ISO would never approve the way things were done in the nineties. There are however some bracketing characters that do *not* mirror, as U+FD3E ORNATE LEFT PARENTHESIS and U+FD3F ORNATE RIGHT PARENTHESIS, for legacy reasons. These characters may with some reason be called “left/right parenthesis”, and it is even helpful to do so. About making an extensive use of Formal Aliases, there is to note that this is a condition for making Formal Aliases more attractive. The other condition is to make them a part of UnicodeDataExtended.txt, a new datafile with additional fields (given that UnicodeData.txt must remain stable with respect to software that cannot run when supplemental fields are present). One issue that inhibits names updates is the outdating of UIs which use local copies, for example word processors and more precisely the charmap and special characters dialog. Editors fear puzzling users with name changes. These changes will stop puzzling under the effect of good communication. The move towards more truth in character names is even likely to deliver an excellent marketing argument, and the overall image of the brand will be strengthened. Software providers must endorse a cultural responsibility and avoid messing with linguistical legacy. If this job is done, the formal aliases I suggest, are already present in the UIs, and adding them to Unicode will simply consecrate this work. This will result in updating Unicode and re-establishing a reasonable synching between presumably current character names and UCD-based information sources. By contrast, if this work isn’t already done on developers’ side, the opportunity to do it has perhaps come. It can be performed thanks to a huge bulk of “new-old” Formal Aliases. Experience seems to prove that for a UCS, the usefulness *at standardization* must be distinguished from the usefulness *at use*. I mean that a Standard useful at standardization is not necessarily useful at use. Reliability is a main criterium for usefulness at use, and this reliability is propped by character names’ accuracy, *not* stability delusion. Publishing a complicated Standard awaiting to be translated, even to English, is not to realize the potential of the process. Reality shows that people who do the job Unicode left undone, are too few and are likely to disappear, spending their time and working force for other productions. Fundamentally, this change, how ever sweeping it might be, and regardless of the duration of former practice, is in conformance with Microsoft’s new Corporate Policy: “[...] our industry does not respect tradition – it only respects innovation. [...] I consider the job before us to be bolder and more ambitious than anything we have ever done. [...] Our customers and society expect us to maximize the value of technology while also preserving the values that are timeless.” Microsoft’s CEO Mr Satya Nadella wrote to All Employees on July 10, 2014 (http://bit.ly/1wRIBqD). The job that might be on stage at Unicode now, would be to maximize the value of the Unicode documentation by making it directly useful to users, shortening the way from standardization to use by eliminating the step of translating to English and by making the Standard conform to the timeless cultural settings Unicode was respectful of in its 1.0 version. Best regards, Marcel Schneider _________________________________________________________________ Suggested Formal Aliases, including the existing ones (numbered) 0028 LEFT PARENTHESIS % OPENING PARENTHESIS 0029 RIGHT PARENTHESIS % CLOSING PARENTHESIS 002E FULL STOP % PERIOD 002F SOLIDUS % SLASH 0040 COMMERCIAL AT % AT SIGN 005B LEFT SQUARE BRACKET % OPENING SQUARE BRACKET 005C REVERSE SOLIDUS % BACKSLASH 005D RIGHT SQUARE BRACKET % CLOSING SQUARE BRACKET 005E CIRCUMFLEX ACCENT % SPACING CIRCUMFLEX 005F LOW LINE % SPACING UNDERSCORE 0060 GRAVE ACCENT % SPACING GRAVE 007B LEFT CURLY BRACKET % OPENING CURLY BRACKET 007D RIGHT CURLY BRACKET % CLOSING CURLY BRACKET 00A1 INVERTED EXCLAMATION MARK % TURNED EXCLAMATION MARK 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK % BACKWARDS-POINTING DOUBLE ANGLE QUOTATION MARK 00B4 ACUTE ACCENT % SPACING ACUTE 00B8 CEDILLA % SPACING CEDILLA 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK % FORWARDS-POINTING DOUBLE ANGLE QUOTATION MARK 00BC VULGAR FRACTION ONE QUARTER % FRACTION ONE QUARTER 00BD VULGAR FRACTION ONE HALF % FRACTION ONE HALF 00BE VULGAR FRACTION THREE QUARTERS % FRACTION THREE QUARTERS 00BF INVERTED QUESTION MARK % TURNED QUESTION MARK 00DF LATIN SMALL LETTER SHARP S % LATIN SMALL LETTER SZ 010C LATIN CAPITAL LETTER C WITH CARON % LATIN CAPITAL LETTER C WITH HACEK 010D LATIN SMALL LETTER C WITH CARON % LATIN SMALL LETTER C WITH HACEK 010E LATIN CAPITAL LETTER D WITH CARON % LATIN CAPITAL LETTER D WITH HACEK 010F LATIN SMALL LETTER D WITH CARON % LATIN SMALL LETTER D WITH HACEK 011A LATIN CAPITAL LETTER E WITH CARON % LATIN CAPITAL LETTER E WITH HACEK 011B LATIN SMALL LETTER E WITH CARON % LATIN SMALL LETTER E WITH HACEK 0132 LATIN CAPITAL LIGATURE IJ % LATIN CAPITAL LETTER IJ 0133 LATIN SMALL LIGATURE IJ % LATIN SMALL LETTER IJ 013D LATIN CAPITAL LETTER L WITH CARON % LATIN CAPITAL LETTER L WITH HACEK 013E LATIN SMALL LETTER L WITH CARON % LATIN SMALL LETTER L WITH HACEK 0147 LATIN CAPITAL LETTER N WITH CARON % LATIN CAPITAL LETTER N WITH HACEK 0148 LATIN SMALL LETTER N WITH CARON % LATIN SMALL LETTER N WITH HACEK 0152 LATIN CAPITAL LIGATURE OE % LATIN CAPITAL LETTER OE 0153 LATIN SMALL LIGATURE OE % LATIN SMALL LETTER OE 0158 LATIN CAPITAL LETTER R WITH CARON % LATIN CAPITAL LETTER R WITH HACEK 0159 LATIN SMALL LETTER R WITH CARON % LATIN SMALL LETTER R WITH HACEK 0160 LATIN CAPITAL LETTER S WITH CARON % LATIN CAPITAL LETTER S WITH HACEK 0161 LATIN SMALL LETTER S WITH CARON % LATIN SMALL LETTER S WITH HACEK 0164 LATIN CAPITAL LETTER T WITH CARON % LATIN CAPITAL LETTER T WITH HACEK 0165 LATIN SMALL LETTER T WITH CARON % LATIN SMALL LETTER T WITH HACEK 017D LATIN CAPITAL LETTER Z WITH CARON % LATIN CAPITAL LETTER Z WITH HACEK 017E LATIN SMALL LETTER Z WITH CARON % LATIN SMALL LETTER Z WITH HACEK 0190 LATIN CAPITAL LETTER OPEN E % LATIN CAPITAL LETTER EPSILON 01A2 LATIN CAPITAL LETTER OI 1 % LATIN CAPITAL LETTER GHA 01A3 LATIN SMALL LETTER OI 2 % LATIN SMALL LETTER GHA 01BE LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE % LATIN STACKED LIGATURE TS [???] 01C4 LATIN CAPITAL LETTER DZ WITH CARON % LATIN CAPITAL LETTER DZ WITH HACEK 01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON % LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH HACEK 01C6 LATIN SMALL LETTER DZ WITH CARON % LATIN SMALL LETTER DZ WITH HACEK 01CE LATIN SMALL LETTER A WITH CARON % LATIN SMALL LETTER A WITH HACEK 01CF LATIN CAPITAL LETTER I WITH CARON % LATIN CAPITAL LETTER I WITH HACEK 01D0 LATIN SMALL LETTER I WITH CARON % LATIN SMALL LETTER I WITH HACEK 01D1 LATIN CAPITAL LETTER O WITH CARON % LATIN CAPITAL LETTER O WITH HACEK 01D2 LATIN SMALL LETTER O WITH CARON % LATIN SMALL LETTER O WITH HACEK 01D3 LATIN CAPITAL LETTER U WITH CARON % LATIN CAPITAL LETTER U WITH HACEK 01D4 LATIN SMALL LETTER U WITH CARON % LATIN SMALL LETTER U WITH HACEK 01D9 LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON % LATIN CAPITAL LETTER U WITH DIAERESIS AND HACEK 01DA LATIN SMALL LETTER U WITH DIAERESIS AND CARON % LATIN SMALL LETTER U WITH DIAERESIS AND HACEK 01E6 LATIN CAPITAL LETTER G WITH CARON % LATIN CAPITAL LETTER G WITH HACEK 01E7 LATIN SMALL LETTER G WITH CARON % LATIN SMALL LETTER G WITH HACEK 01E8 LATIN CAPITAL LETTER K WITH CARON % LATIN CAPITAL LETTER K WITH HACEK 01E9 LATIN SMALL LETTER K WITH CARON % LATIN SMALL LETTER K WITH HACEK 01EE LATIN CAPITAL LETTER EZH WITH CARON % LATIN CAPITAL LETTER EZH WITH HACEK 01EF LATIN SMALL LETTER EZH WITH CARON % LATIN SMALL LETTER EZH WITH HACEK 01F0 LATIN SMALL LETTER J WITH CARON % LATIN SMALL LETTER J WITH HACEK 021E LATIN CAPITAL LETTER H WITH CARON % LATIN CAPITAL LETTER H WITH HACEK 021F LATIN SMALL LETTER H WITH CARON % LATIN SMALL LETTER H WITH HACEK 0238 LATIN SMALL LETTER DB DIGRAPH % LATIN SMALL LIGATURE DB 0239 LATIN SMALL LETTER QP DIGRAPH % LATIN SMALL LIGATURE QP 025B LATIN SMALL LETTER OPEN E % LATIN SMALL LETTER EPSILON 025E LATIN SMALL LETTER CLOSED REVERSED OPEN E % LATIN SMALL LETTER CLOSED REVERSED EPSILON 027F LATIN SMALL LETTER REVERSED R WITH FISHHOOK % LATIN SMALL LETTER LONG LEG TURNED IOTA 0285 LATIN SMALL LETTER SQUAT REVERSED ESH % LATIN SMALL LETTER REVERSED R WITH FISHHOOK AND RETROFLEX HOOK 02C7 CARON % MODIFIER LETTER HACEK 030C COMBINING CARON % COMBINING HACEK 032C COMBINING CARON BELOW % COMBINING HACEK BELOW 039B GREEK CAPITAL LETTER LAMDA % GREEK CAPITAL LETTER LAMBDA 03BB GREEK SMALL LETTER LAMDA % GREEK SMALL LETTER LAMBDA 047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO % CYRILLIC CAPITAL LETTER BEAUTIFUL OMEGA 047D CYRILLIC SMALL LETTER OMEGA WITH TITLO % CYRILLIC SMALL LETTER BEAUTIFUL OMEGA 0598 HEBREW ACCENT ZARQA % HEBREW ACCENT TSINORIT 05AE HEBREW ACCENT ZINOR % HEBREW ACCENT TSINOR 0670 ARABIC LETTER SUPERSCRIPT ALEF % ARABIC VOWEL SIGN SUPERSCRIPT ALEF 06C0 ARABIC LETTER HEH WITH YEH ABOVE % ARABIC LIGATURE HEH WITH YEH ABOVE 06C2 ARABIC LETTER HEH GOAL WITH HAMZA ABOVE % ARABIC LIGATURE HEH GOAL WITH HAMZA ABOVE 06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE % ARABIC LIGATURE YEH BARREE WITH HAMZA ABOVE 0709 SYRIAC SUBLINEAR COLON SKEWED RIGHT 3 % SYRIAC SUBLINEAR COLON SKEWED LEFT 0A01 GURMUKHI SIGN ADAK BINDI % GURMUKHI SIGN ADDAK BINDI 0B83 TAMIL SIGN VISARGA % TAMIL SIGN AYTHAM 0CDE KANNADA LETTER FA 4 % KANNADA LETTER LLLA 0E9D LAO LETTER FO TAM 5 % LAO LETTER FO FON 0E9F LAO LETTER FO SUNG 6 % LAO LETTER FO FAY 0EA3 LAO LETTER LO LING 7 % LAO LETTER RO 0EA5 LAO LETTER LO LOOT 8 % LAO LETTER LO 0F0A TIBETAN MARK BKA- SHOG YIG MGO % TIBETAN MARK ZOU YIK GUI GO 0F0B TIBETAN MARK INTERSYLLABIC TSHEG % TIBETAN MARK TSHEG 0F0C TIBETAN MARK DELIMITER TSHEG BSTAR TIBETAN MARK NO-BREAK TSHEG [???] 0FD0 TIBETAN MARK BSKA- SHOG GI MGO RGYAN 9 % TIBETAN MARK BKA- SHOG GI MGO RGYAN 156F CANADIAN SYLLABICS TTH % CANADIAN SYLLABICS ASTERISK 178E KHMER LETTER NNO % KHMER LETTER NNA 179E KHMER LETTER SSO % KHMER LETTER SSA 1D27 GREEK LETTER SMALL CAPITAL LAMDA % GREEK LETTER SMALL CAPITAL LAMBDA 1E9E LATIN CAPITAL LETTER SHARP S % LATIN CAPITAL LETTER SZ 2018 LEFT SINGLE QUOTATION MARK % SINGLE TURNED COMMA QUOTATION MARK 2019 RIGHT SINGLE QUOTATION MARK % SINGLE COMMA QUOTATION MARK 201A SINGLE LOW-9 QUOTATION MARK % LOW SINGLE COMMA QUOTATION MARK 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK % SINGLE REVERSED COMMA QUOTATION MARK 201C LEFT DOUBLE QUOTATION MARK % DOUBLE TURNED COMMA QUOTATION MARK 201D RIGHT DOUBLE QUOTATION MARK % DOUBLE COMMA QUOTATION MARK 201E DOUBLE LOW-9 QUOTATION MARK % LOW DOUBLE COMMA QUOTATION MARK 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK % DOUBLE REVERSED COMMA QUOTATION MARK 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK % SINGLE BACKWARDS-POINTING ANGLE QUOTATION MARK 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK % SINGLE FORWARDS-POINTING ANGLE QUOTATION MARK 203E OVERLINE % SPACING OVERSCORE 2045 LEFT SQUARE BRACKET WITH QUILL % OPENING SQUARE BRACKET WITH QUILL 2046 RIGHT SQUARE BRACKET WITH QUILL % CLOSING SQUARE BRACKET WITH QUILL 207D SUPERSCRIPT LEFT PARENTHESIS % SUPERSCRIPT OPENING PARENTHESIS 207E SUPERSCRIPT RIGHT PARENTHESIS % SUPERSCRIPT CLOSING PARENTHESIS 208D SUBSCRIPT LEFT PARENTHESIS % SUBSCRIPT OPENING PARENTHESIS 208E SUBSCRIPT RIGHT PARENTHESIS SUBSCRIPT CLOSING PARENTHESIS 20E5 COMBINING REVERSE SOLIDUS OVERLAY % COMBINING BACKSLASH OVERLAY 2113 SCRIPT SMALL L % MATHEMATICAL SYMBOL ELL 2118 SCRIPT CAPITAL P 10 % WEIERSTRASS ELLIPTIC FUNCTION 2150 VULGAR FRACTION ONE SEVENTH % FRACTION ONE SEVENTH 2151 VULGAR FRACTION ONE NINTH % FRACTION ONE NINTH 2152 VULGAR FRACTION ONE TENTH % FRACTION ONE TENTH 2153 VULGAR FRACTION ONE THIRD % FRACTION ONE THIRD 2154 VULGAR FRACTION TWO THIRDS % FRACTION TWO THIRDS 2155 VULGAR FRACTION ONE FIFTH % FRACTION ONE FIFTH 2156 VULGAR FRACTION TWO FIFTHS % FRACTION TWO FIFTHS 2157 VULGAR FRACTION THREE FIFTHS % FRACTION THREE FIFTHS 2158 VULGAR FRACTION FOUR FIFTHS % FRACTION FOUR FIFTHS 2159 VULGAR FRACTION ONE SIXTH % FRACTION ONE SIXTH 215A VULGAR FRACTION FIVE SIXTHS % FRACTION FIVE SIXTHS 215B VULGAR FRACTION ONE EIGHTH % FRACTION ONE EIGHTH 215C VULGAR FRACTION THREE EIGHTHS % FRACTION THREE EIGHTHS 215D VULGAR FRACTION FIVE EIGHTHS % FRACTION FIVE EIGHTHS 215E VULGAR FRACTION SEVEN EIGHTHS % FRACTION SEVEN EIGHTHS 22A2 RIGHT TACK % FORWARDS TACK 22A3 LEFT TACK % BACKWARDS TACK 22C9 LEFT NORMAL FACTOR SEMIDIRECT PRODUCT % BACKWARDS NORMAL FACTOR SEMIDIRECT PRODUCT 22CA RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT % FORWARDS NORMAL FACTOR SEMIDIRECT PRODUCT 22CB LEFT SEMIDIRECT PRODUCT % BACKWARDS SEMIDIRECT PRODUCT 22CC RIGHT SEMIDIRECT PRODUCT % FORWARDS SEMIDIRECT PRODUCT 2308 LEFT CEILING % BEGIN CEILING 2309 RIGHT CEILING % END CEILING 230A LEFT FLOOR % BEGIN FLOOR 230B RIGHT FLOOR % END FLOOR 2329 LEFT-POINTING ANGLE BRACKET % BACKWARDS-POINTING ANGLE BRACKET 232A RIGHT-POINTING ANGLE BRACKET % FORWARDS-POINTING ANGLE BRACKET 232B ERASE TO THE LEFT % ERASE BACKWARDS 2446 OCR BRANCH BANK IDENTIFICATION % MICR TRANSIT SYMBOL 2447 OCR AMOUNT OF CHECK % MICR AMOUNT SYMBOL 2448 OCR DASH 11 % MICR ON US SYMBOL 2449 OCR CUSTOMER ACCOUNT NUMBER 12 % MICR DASH SYMBOL 2768 MEDIUM LEFT PARENTHESIS ORNAMENT % MEDIUM OPENING PARENTHESIS ORNAMENT 2769 MEDIUM RIGHT PARENTHESIS ORNAMENT % MEDIUM CLOSING PARENTHESIS ORNAMENT 276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT % MEDIUM FLATTENED OPENING PARENTHESIS ORNAMENT 276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT % MEDIUM FLATTENED CLOSING PARENTHESIS ORNAMENT 276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT % MEDIUM OPENING-POINTING ANGLE BRACKET ORNAMENT 276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT % MEDIUM CLOSING-POINTING ANGLE BRACKET ORNAMENT 276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT % HEAVY BACKWARDS-POINTING ANGLE QUOTATION MARK ORNAMENT 276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT % HEAVY FORWARDS-POINTING ANGLE QUOTATION MARK ORNAMENT 2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT % HEAVY BACKWARDS-POINTING ANGLE BRACKET ORNAMENT 2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT % HEAVY FORWARDS-POINTING ANGLE BRACKET ORNAMENT 2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT % LIGHT OPENING TORTOISE SHELL BRACKET ORNAMENT 2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT % LIGHT CLOSING TORTOISE SHELL BRACKET ORNAMENT 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT % MEDIUM OPENING CURLY BRACKET ORNAMENT 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT % MEDIUM CLOSING CURLY BRACKET ORNAMENT 27C5 LEFT S-SHAPED BAG DELIMITER % OPENING S-SHAPED BAG DELIMITER 27C6 RIGHT S-SHAPED BAG DELIMITER % CLOSING S-SHAPED BAG DELIMITER 27C8 REVERSE SOLIDUS PRECEDING SUBSET % BACKSLASH PRECEDING SUBSET 27C9 SUPERSET PRECEDING SOLIDUS % SUPERSET PRECEDING SLASH 27D3 LOWER RIGHT CORNER WITH DOT % LOWER CORNER WITH DOT 27D4 UPPER LEFT CORNER WITH DOT % UPPER CORNER WITH DOT 27D5 LEFT OUTER JOIN % BACKWARDS OUTER JOIN 27D6 RIGHT OUTER JOIN % FORWARDS OUTER JOIN 27DC LEFT MULTIMAP % BACKWARDS MULTIMAP 27DD LONG RIGHT TACK % LONG FORWARDS TACK 27DE LONG LEFT TACK % LONG BACKWARDS TACK 27E2 WHITE CONCAVE-SIDED DIAMOND WITH LEFTWARDS TICK % WHITE CONCAVE-SIDED DIAMOND WITH BACKWARDS TICK 27E3 WHITE CONCAVE-SIDED DIAMOND WITH RIGHTWARDS TICK % WHITE CONCAVE-SIDED DIAMOND WITH FORWARDS TICK 27E4 WHITE SQUARE WITH LEFTWARDS TICK % WHITE SQUARE WITH BACKWARDS TICK 27E5 WHITE SQUARE WITH RIGHTWARDS TICK % WHITE SQUARE WITH FORWARDS TICK 27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET % MATHEMATICAL OPENING WHITE SQUARE BRACKET 27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET % MATHEMATICAL CLOSING WHITE SQUARE BRACKET 27E8 MATHEMATICAL LEFT ANGLE BRACKET % MATHEMATICAL OPENING ANGLE BRACKET 27E9 MATHEMATICAL RIGHT ANGLE BRACKET % MATHEMATICAL CLOSING ANGLE BRACKET 27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET % MATHEMATICAL OPENING DOUBLE ANGLE BRACKET 27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET % MATHEMATICAL CLOSING DOUBLE ANGLE BRACKET 27EC MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET % MATHEMATICAL OPENING WHITE TORTOISE SHELL BRACKET 27ED MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET % MATHEMATICAL CLOSING WHITE TORTOISE SHELL BRACKET 27EE MATHEMATICAL LEFT FLATTENED PARENTHESIS % MATHEMATICAL OPENING FLATTENED PARENTHESIS 27EF MATHEMATICAL RIGHT FLATTENED PARENTHESIS % MATHEMATICAL CLOSING FLATTENED PARENTHESIS 2983 LEFT WHITE CURLY BRACKET % OPENING WHITE CURLY BRACKET 2984 RIGHT WHITE CURLY BRACKET % CLOSING WHITE CURLY BRACKET 2985 LEFT WHITE PARENTHESIS % OPENING WHITE PARENTHESIS 2986 RIGHT WHITE PARENTHESIS % CLOSING WHITE PARENTHESIS 2987 Z NOTATION LEFT IMAGE BRACKET % Z NOTATION OPENING IMAGE BRACKET 2988 Z NOTATION RIGHT IMAGE BRACKET % Z NOTATION CLOSING IMAGE BRACKET 2989 Z NOTATION LEFT BINDING BRACKET % Z NOTATION OPENING BINDING BRACKET 298A Z NOTATION RIGHT BINDING BRACKET % Z NOTATION CLOSING BINDING BRACKET 298B LEFT SQUARE BRACKET WITH UNDERBAR % OPENING SQUARE BRACKET WITH UNDERBAR 298C RIGHT SQUARE BRACKET WITH UNDERBAR % CLOSING SQUARE BRACKET WITH UNDERBAR 298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER % OPENING SQUARE BRACKET WITH TICK IN TOP CORNER 298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER % CLOSING SQUARE BRACKET WITH TICK IN BOTTOM CORNER 298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER % OPENING SQUARE BRACKET WITH TICK IN BOTTOM CORNER 2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER % CLOSING SQUARE BRACKET WITH TICK IN TOP CORNER 2991 LEFT ANGLE BRACKET WITH DOT % OPENING ANGLE BRACKET WITH DOT 2992 RIGHT ANGLE BRACKET WITH DOT % CLOSING ANGLE BRACKET WITH DOT 2993 LEFT ARC LESS-THAN BRACKET % OPENING ARC LESS-THAN BRACKET 2994 RIGHT ARC GREATER-THAN BRACKET % CLOSING ARC GREATER-THAN BRACKET 2995 DOUBLE LEFT ARC GREATER-THAN BRACKET % DOUBLE OPENING ARC GREATER-THAN BRACKET 2996 DOUBLE RIGHT ARC LESS-THAN BRACKET % DOUBLE CLOSING ARC LESS-THAN BRACKET 2997 LEFT BLACK TORTOISE SHELL BRACKET % OPENING BLACK TORTOISE SHELL BRACKET 2998 RIGHT BLACK TORTOISE SHELL BRACKET % CLOSING BLACK TORTOISE SHELL BRACKET 299B MEASURED ANGLE OPENING LEFT % MEASURED ANGLE OPENING BACKWARDS 29A0 SPHERICAL ANGLE OPENING LEFT % SPHERICAL ANGLE OPENING BACKWARDS 29A8 MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND RIGHT % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND FORWARDS 29A9 MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND LEFT % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND BACKWARDS 29AA MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND RIGHT % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND FORWARDS 29AB MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND LEFT % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND BACKWARDS 29AC MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND UP % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING FORWARDS AND UP 29AD MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND UP % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING BACKWARDS AND UP 29AE MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND DOWN % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING FORWARDS AND DOWN 29AF MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND DOWN % MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING BACKWARDS AND DOWN 29B8 CIRCLED REVERSE SOLIDUS % CIRCLED BACKSLASH 29C2 CIRCLE WITH SMALL CIRCLE TO THE RIGHT % CIRCLE WITH SMALL CIRCLE AFTER 29C3 CIRCLE WITH TWO HORIZONTAL STROKES TO THE RIGHT % CIRCLE WITH TWO HORIZONTAL STROKES AFTER 29CE RIGHT TRIANGLE ABOVE LEFT TRIANGLE % FORWARDS TRIANGLE ABOVE BACKWARDS TRIANGLE 29CF LEFT TRIANGLE BESIDE VERTICAL BAR BACKWARDS TRIANGLE BESIDE VERTICAL BAR 29D0 VERTICAL BAR BESIDE RIGHT TRIANGLE % VERTICAL BAR BESIDE FORWARDS TRIANGLE 29D1 BOWTIE WITH LEFT HALF BLACK % BOWTIE WITH BACKWARDS HALF BLACK 29D2 BOWTIE WITH RIGHT HALF BLACK % BOWTIE WITH FORWARDS HALF BLACK 29D4 TIMES WITH LEFT HALF BLACK % TIMES WITH BACKWARDS HALF BLACK 29D5 TIMES WITH RIGHT HALF BLACK % TIMES WITH FORWARDS HALF BLACK 29D8 LEFT WIGGLY FENCE % BACKWARDS WIGGLY FENCE 29D9 RIGHT WIGGLY FENCE % FORWARDS WIGGLY FENCE 29DA LEFT DOUBLE WIGGLY FENCE % BACKWARDS DOUBLE WIGGLY FENCE 29DB RIGHT DOUBLE WIGGLY FENCE % FORWARDS DOUBLE WIGGLY FENCE 29E8 DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK % DOWN-POINTING TRIANGLE WITH BACKWARDS HALF BLACK 29E9 DOWN-POINTING TRIANGLE WITH RIGHT HALF BLACK % DOWN-POINTING TRIANGLE WITH FORWARDS HALF BLACK 29F5 REVERSE SOLIDUS OPERATOR % BACKSLASH OPERATOR 29F6 SOLIDUS WITH OVERBAR % SLASH WITH OVERBAR 29F7 REVERSE SOLIDUS WITH HORIZONTAL STROKE % BACKSLASH WITH HORIZONTAL STROKE 29F8 BIG SOLIDUS % BIG SLASH 29F9 BIG REVERSE SOLIDUS % BIG BACKSLASH 29FC LEFT-POINTING CURVED ANGLE BRACKET % BACKWARDS-POINTING CURVED ANGLE BRACKET 29FD RIGHT-POINTING CURVED ANGLE BRACKET % FORWARDS-POINTING CURVED ANGLE BRACKET 2A1E LARGE LEFT TRIANGLE OPERATOR % LARGE BACKWARDS TRIANGLE OPERATOR 2A2D PLUS SIGN IN LEFT HALF CIRCLE % PLUS SIGN IN BACKWARDS HALF CIRCLE 2A2E PLUS SIGN IN RIGHT HALF CIRCLE % PLUS SIGN IN FORWARDS HALF CIRCLE 2A34 MULTIPLICATION SIGN IN LEFT HALF CIRCLE % MULTIPLICATION SIGN IN BACKWARDS HALF CIRCLE 2A35 MULTIPLICATION SIGN IN RIGHT HALF CIRCLE % MULTIPLICATION SIGN IN FORWARDS HALF CIRCLE 2A83 LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE RIGHT % LESS-THAN OR SLANTED EQUAL TO WITH DOT ON TOP 2A84 GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE LEFT % GREATER-THAN OR SLANTED EQUAL TO WITH DOT ON TOP 2ACD SQUARE LEFT OPEN BOX OPERATOR % SQUARE BACKWARDS OPEN BOX OPERATOR 2ACE SQUARE RIGHT OPEN BOX OPERATOR % SQUARE FORWARDS OPEN BOX OPERATOR 2ADE SHORT LEFT TACK % SHORT BACKWARDS TACK 2AE2 VERTICAL BAR TRIPLE RIGHT TURNSTILE % VERTICAL BAR TRIPLE FORWARDS TURNSTILE 2AE3 DOUBLE VERTICAL BAR LEFT TURNSTILE % DOUBLE VERTICAL BAR BACKWARDS TURNSTILE 2AE4 VERTICAL BAR DOUBLE LEFT TURNSTILE % VERTICAL BAR DOUBLE BACKWARDS TURNSTILE 2AE5 DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE % DOUBLE VERTICAL BAR DOUBLE BACKWARDS TURNSTILE 2AE6 LONG DASH FROM LEFT MEMBER OF DOUBLE VERTICAL % LONG DASH FROM BACKWARDS MEMBER OF DOUBLE VERTICAL 2E02 LEFT SUBSTITUTION BRACKET % OPENING SUBSTITUTION BRACKET 2E03 RIGHT SUBSTITUTION BRACKET % CLOSING SUBSTITUTION BRACKET 2E04 LEFT DOTTED SUBSTITUTION BRACKET % OPENING DOTTED SUBSTITUTION BRACKET 2E05 RIGHT DOTTED SUBSTITUTION BRACKET % CLOSING DOTTED SUBSTITUTION BRACKET 2E09 LEFT TRANSPOSITION BRACKET % OPENING TRANSPOSITION BRACKET 2E0A RIGHT TRANSPOSITION BRACKET % CLOSING TRANSPOSITION BRACKET 2E0C LEFT RAISED OMISSION BRACKET % OPENING RAISED OMISSION BRACKET 2E0D RIGHT RAISED OMISSION BRACKET % CLOSING RAISED OMISSION BRACKET 2E1C LEFT LOW PARAPHRASE BRACKET % OPENING LOW PARAPHRASE BRACKET 2E1D RIGHT LOW PARAPHRASE BRACKET % CLOSING LOW PARAPHRASE BRACKET 2E20 LEFT VERTICAL BAR WITH QUILL % OPENING VERTICAL BAR WITH QUILL 2E21 RIGHT VERTICAL BAR WITH QUILL % CLOSING VERTICAL BAR WITH QUILL 2E22 TOP LEFT HALF BRACKET % TOP OPENING HALF BRACKET 2E23 TOP RIGHT HALF BRACKET % TOP CLOSING HALF BRACKET 2E24 BOTTOM LEFT HALF BRACKET % BOTTOM OPENING HALF BRACKET 2E25 BOTTOM RIGHT HALF BRACKET % BOTTOM CLOSING HALF BRACKET 2E26 LEFT SIDEWAYS U BRACKET % OPENING SIDEWAYS U BRACKET 2E27 RIGHT SIDEWAYS U BRACKET % CLOSING SIDEWAYS U BRACKET 2E28 LEFT DOUBLE PARENTHESIS % OPENING DOUBLE PARENTHESIS 2E29 RIGHT DOUBLE PARENTHESIS % CLOSING DOUBLE PARENTHESIS 3008 LEFT ANGLE BRACKET % OPENING ANGLE BRACKET 3009 RIGHT ANGLE BRACKET % CLOSING ANGLE BRACKET 300A LEFT DOUBLE ANGLE BRACKET % OPENING DOUBLE ANGLE BRACKET 300B RIGHT DOUBLE ANGLE BRACKET % CLOSING DOUBLE ANGLE BRACKET 300C LEFT CORNER BRACKET % OPENING CORNER BRACKET 300D RIGHT CORNER BRACKET % CLOSING CORNER BRACKET 300E LEFT WHITE CORNER BRACKET % OPENING WHITE CORNER BRACKET 300F RIGHT WHITE CORNER BRACKET % CLOSING WHITE CORNER BRACKET 3010 LEFT BLACK LENTICULAR BRACKET % OPENING BLACK LENTICULAR BRACKET 3011 RIGHT BLACK LENTICULAR BRACKET % CLOSING BLACK LENTICULAR BRACKET 3014 LEFT TORTOISE SHELL BRACKET % OPENING TORTOISE SHELL BRACKET 3015 RIGHT TORTOISE SHELL BRACKET % CLOSING TORTOISE SHELL BRACKET 3016 LEFT WHITE LENTICULAR BRACKET % OPENING WHITE LENTICULAR BRACKET 3017 RIGHT WHITE LENTICULAR BRACKET % CLOSING WHITE LENTICULAR BRACKET 3018 LEFT WHITE TORTOISE SHELL BRACKET % OPENING WHITE TORTOISE SHELL BRACKET 3019 RIGHT WHITE TORTOISE SHELL BRACKET % CLOSING WHITE TORTOISE SHELL BRACKET 301A LEFT WHITE SQUARE BRACKET % OPENING WHITE SQUARE BRACKET 301B RIGHT WHITE SQUARE BRACKET % CLOSING WHITE SQUARE BRACKET 3021 HANGZHOU NUMERAL ONE % SUZHOU NUMERAL ONE 3022 HANGZHOU NUMERAL TWO % SUZHOU NUMERAL TWO 3023 HANGZHOU NUMERAL THREE % SUZHOU NUMERAL THREE 3024 HANGZHOU NUMERAL FOUR % SUZHOU NUMERAL FOUR 3025 HANGZHOU NUMERAL FIVE % SUZHOU NUMERAL FIVE 3026 HANGZHOU NUMERAL SIX % SUZHOU NUMERAL SIX 3027 HANGZHOU NUMERAL SEVEN % SUZHOU NUMERAL SEVEN 3028 HANGZHOU NUMERAL EIGHT % SUZHOU NUMERAL EIGHT 3029 HANGZHOU NUMERAL NINE % SUZHOU NUMERAL NINE 309F HIRAGANA DIGRAPH YORI % HIRAGANA LIGATURE YORI 30FF KATAKANA DIGRAPH KOTO % KATAKANA LIGATURE KOTO A015 YI SYLLABLE WU 13 % YI SYLLABLE ITERATION MARK FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET 14 % PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET FE59 SMALL LEFT PARENTHESIS % SMALL OPENING PARENTHESIS FE5A SMALL RIGHT PARENTHESIS % SMALL CLOSING PARENTHESIS FE5B SMALL LEFT CURLY BRACKET % SMALL OPENING CURLY BRACKET FE5C SMALL RIGHT CURLY BRACKET % SMALL CLOSING CURLY BRACKET FE5D SMALL LEFT TORTOISE SHELL BRACKET % SMALL OPENING TORTOISE SHELL BRACKET FE5E SMALL RIGHT TORTOISE SHELL BRACKET % SMALL CLOSING TORTOISE SHELL BRACKET FE6B SMALL COMMERCIAL AT % SMALL AT SIGN FEFF ZERO WIDTH NO-BREAK SPACE 15 % BYTE ORDER MARK FF08 FULLWIDTH LEFT PARENTHESIS % FULLWIDTH OPENING PARENTHESIS FF09 FULLWIDTH RIGHT PARENTHESIS % FULLWIDTH CLOSING PARENTHESIS FF20 FULLWIDTH COMMERCIAL AT % FULLWIDTH AT SIGN FF3B FULLWIDTH LEFT SQUARE BRACKET % FULLWIDTH OPENING SQUARE BRACKET FF3C FULLWIDTH REVERSE SOLIDUS % FULLWIDTH BACKSLASH FF3D FULLWIDTH RIGHT SQUARE BRACKET % FULLWIDTH CLOSING SQUARE BRACKET FF5B FULLWIDTH LEFT CURLY BRACKET % FULLWIDTH OPENING CURLY BRACKET FF5D FULLWIDTH RIGHT CURLY BRACKET % FULLWIDTH CLOSING CURLY BRACKET FF5F FULLWIDTH LEFT WHITE PARENTHESIS % FULLWIDTH OPENING WHITE PARENTHESIS FF60 FULLWIDTH RIGHT WHITE PARENTHESIS % FULLWIDTH CLOSING WHITE PARENTHESIS FF62 HALFWIDTH LEFT CORNER BRACKET % HALFWIDTH OPENING CORNER BRACKET FF63 HALFWIDTH RIGHT CORNER BRACKET % HALFWIDTH CLOSING CORNER BRACKET 1038D UGARITIC LETTER LAMDA % UGARITIC LETTER LAMBDA 122D4 CUNEIFORM SIGN SHIR TENU 16 % CUNEIFORM SIGN NU11 TENU 122D5 CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR 17 % CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR 1D0C5 BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS 18 % BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS 1D13A MUSICAL SYMBOL MULTI REST % MUSICAL SYMBOL DOUBLE WHOLE-REST 1D6B2 MATHEMATICAL BOLD CAPITAL LAMDA % MATHEMATICAL BOLD CAPITAL LAMBDA 1D6CC MATHEMATICAL BOLD SMALL LAMDA % MATHEMATICAL BOLD SMALL LAMBDA 1D6EC MATHEMATICAL ITALIC CAPITAL LAMDA % MATHEMATICAL ITALIC CAPITAL LAMBDA 1D706 MATHEMATICAL ITALIC SMALL LAMDA % MATHEMATICAL ITALIC SMALL LAMBDA 1D726 MATHEMATICAL BOLD ITALIC CAPITAL LAMDA % MATHEMATICAL BOLD ITALIC CAPITAL LAMBDA 1D740 MATHEMATICAL BOLD ITALIC SMALL LAMDA % MATHEMATICAL BOLD ITALIC SMALL LAMBDA 1D760 MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA % MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMBDA 1D77A MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA % MATHEMATICAL SANS-SERIF BOLD SMALL LAMBDA 1D79A MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA % MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMBDA 1D7B4 MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMDA % MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMBDA [...?] ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Sat May 2 06:52:34 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Corrigendum
In my post of Wed Mar 25 12:54:45 CDT 2015, the tenth code point should read U+1E37, not U+01E7. Sorry. The following list shows the real instances in the NamesList where what I call “idle ‘@+’” occur (that is, the instances of NOTICE_LINEs in char entries, which are there displayed the same as COMMENT_LINEs: U+0140 U+0149 U+01A6 U+0268 U+0269 U+0277 U+027C U+029E U+0307 U+1E37 (corrected) U+1E5B U+2301 U+234A U+237B U+237D U+237E U+237F U+2425 U+2426 U+16F27 U+16F32 U+16F52 U+16F53 Mostly that marks up information about backwards standards compatibility issues. These notices should be converted to ordinary comments (annotations) because without any distinctive formatting, it is of no use that they were notices rather than annotations. Fundamentally, since those issues grow less important as the related standards are of no more than historical interest, they must not impact the Code Charts’ layout neither. Best regards, Marcel Schneider
Date/Time: Sat May 2 06:53:47 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+026A
Matching: 0197 LATIN CAPITAL LETTER I WITH STROKE [...] * lowercase is 0268 * ISO 6438 gives lowercase as 026A, not 0268 with: 0268 LATIN SMALL LETTER I WITH STROKE [...] * uppercase is 0197 @+ * ISO 6438 gives lowercase of 0197 as 026A, not 0268 , the COMMENT_LINE for: 026A LATIN LETTER SMALL CAPITAL I [...] * uppercase is 0197 should probably complete to: * ISO 6438 gives 0197 as uppercase Best regards, Marcel Schneider
Date/Time: Sat May 2 06:59:49 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: About casing information
I’ve already sent some feedback containing comments on casing information provided in the Code Charts (Mon Apr 20, 2015). By applying I discovered that each case is different and leads to various solutions. U+00D0/U+00F0, U+0110/U+0111: The legacy practice of encoding the capitals Ð/Đ once only was so confusing that these letters have largely merited being gratified with both casing annotations *and* cross-references with the other case. The same applies to U+00FF/U+0178 for the reason that this capital had not been encoded at all, and that even in Latin-1 (that was not the only default, see the œŒ). In this case the annotations could rather explain why these two are so far away from each other (a fact that was bugging people). By contrast, U+0110 and U+0111 are so near it doesn’t make much sense to provide xrefs however. Comments neither, if there were no debt to repair. But that should be done explicitly (even if without excuses, in a Code Chart...). This is why I end up preferring it this way: 00D0 LATIN CAPITAL LETTER ETH * lowercase is 00F0 x (latin small letter eth - 00F0) 00F0 LATIN SMALL LETTER ETH * uppercase is 00D0 x (latin capital letter eth - 00D0) 0110 LATIN CAPITAL LETTER D WITH STROKE * lowercase is 0111 0111 LATIN SMALL LETTER D WITH STROKE * uppercase is 0110 00FF LATIN SMALL LETTER Y WITH DIAERESIS * uppercase is 0178 (not encoded at 00DF for compatibility with ISO/IEC 8859-1) x (latin capital letter y with diaeresis - 0178) 0178 LATIN CAPITAL LETTER Y WITH DIAERESIS * lowercase has been encoded at 00FF for compatibility with ISO/IEC 8859-1 x (latin small letter y with diaeresis - 00FF) Regarding U+00DF and U+1E9E (my post of Mon Apr 20, 2015), there is a need of resolving the puzzle of a dedicated capital letter while uppercase is written with two S. Since ẞ (uppercase) is encoded and on keyboard, things grew simple and the annotations are not accurate any longer. I would also add some explanations for the capital letter (additions are bracketed with underscores): 00DF LATIN SMALL LETTER SHARP S = Eszett * German * uppercase is _'0053 0053' or 1E9E_ [...] 1E9E LATIN CAPITAL LETTER SHARP S _= latin capital letter sz_ _* used to disambiguate the orthography of uppercase names_ * lowercase is 00DF x (latin small letter sharp s - 00DF) Best regards, Marcel Schneider
Date/Time: Sat May 2 07:03:05 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: LAMDA U+039B/U+03BB
As already posted (Mon Apr 27, 2015), the spelling LAMBDA isn’t right neither, so giving the authentic spelling LABDA as an alias can help overriding the default of the ISO-originated Standardese. I suggest therefore adding an alias line yet for the capital U+309B, and echoing it for the small letter, by completing the existing one. To blur the issue even more, some other alternative spellings may be given, namely for the MU (MY), NU (NY), and UPSILON (YPSILON). However, at merger, Unicode opposed some resistence against that messing with a Greek letter name (Unicode 1.0 was respectful of the canonical spelling LAMBDA, and even today there is U+019B LATIN SMALL LETTER LAMBDA WITH STROKE, = barred lambda, lambda bar, for Americanist phonetic usage), and it was supposedly not without hard discussions that Unicode finally resigned to comply. The exerted violence, on ISO side, presumably under the menace of secession, gives an idea of how important naming issues are for scaling domination. At the end at least, we hope, truth will overcome. Best wishes, Marcel Schneider _____________________________________ 039B GREEK CAPITAL LETTER LAMDA = lambda, labda 03BB GREEK SMALL LETTER LAMDA = lambda, labda 03BC GREEK SMALL LETTER MU = my 03BD GREEK SMALL LETTER NU = ny 03C5 GREEK SMALL LETTER UPSILON = ypsilon ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Sat May 2 07:10:09 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: PRI #297: UnicodeXData.txt
As I’ve already posted on Mon Apr 27, 2015, the information about formal aliases seems to be out of reach for much software users who are confronted with when searching for information about characters. It therefore seems to be consistent to make it better available. The same would apply to informative aliases. Unicode clearly states in NamesList.txt, that “this file should not be parsed for machine-readable information”. By the way, all the informative aliases Unicode added for the information of users, implementers and developers, are lost because they seem to be nowhere else in the UCD. This is why I suggest launching a new comprehensive datafile following the model of UnicodeData.txt, generated by adding many fields to UnicodeData and called therefore “UnicodeData Extended” or UnicodeXData.txt (X like in XHTML), as I already began to suggest on Thu Apr 30 06:44:57 CDT 2015. Among the useful fields to be added, there will surely be one for the formal alias and 8 others for the informative aliases, one for the Indic syllable category, one for the bidi-mirroring glyph, one for the version it was encoded in and one for the related date, 16 for cross-references (code point only), several for standardized variants, and so on. Launching this new file gives way to an extensive communication aimed at IT, and will surely create a buzz, which will be able to convince developers about the usefulness of aliases, whether formal or informative. Joint to the low-threshold access to them, that can help true names to come on stage. It is important Unicode makes this effort, because the actual high-threshold access to data, needing complex parsing algorithms and depending thus on the goodwill of the involved persons), is likely to hide the truth to the public. Best regards, Marcel Schneider
Date/Time: Mon May 4 07:53:08 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Corrigendum
I’m sorry to send you a late correction of a former (already belated) post of mine. The formal alias suggestion I sent you on Thu Apr 30 06:44:57 CDT 2015 for U+0285 LATIN SMALL LETTER SQUAT REVERSED ESH as a part of the list, is erroneous. The Code Chart states it must be LATIN SMALL LETTER LONG LEG TURNED IOTA WITH RETROFLEX HOOK. (UTN #27, which notes “This is actually a reversed fishhook r with retroflex hook.”, would thus be updated.) Best regards, Marcel Schneider
Date/Time: Mon May 4 07:53:59 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Equalizing locales treatment in the Code Charts
While making up some files, I got aware of some need of feedback that unfortunately was not already covered accurately during the beta review period. Excuse me please to send you this a whole week too late. Some formal aliases correcting misspellings in character names are completed by an annotation that supposedly plays the role of presenting an excuse to the public for having misspelled a character name. This feature is found in two character entries, namely: FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET % PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET * misspelling of "BRACKET" in character name is a known defect 1D0C5 BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS % BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS * misspelling of "FTHORA" in character name is a known defect However, another character name has got the mischief of being misspelled, and has been added a formal alias to correct, but not the related annotation, while another misspelled character name (following UTN #27) remained without any addition: 0FD0 TIBETAN MARK BSKA- SHOG GI MGO RGYAN % TIBETAN MARK BKA- SHOG GI MGO RGYAN * used in Bhutan 0A01 GURMUKHI SIGN ADAK BINDI (Regarding the Greek letter Lambda, already mentioned, the ISO spelling “Lamda” was intentional and is therefore not to be considered here as a misspelling.) It seems consistent that these two characters would be given equally an annotation and, if not already done, a formal alias. The same would then apply to every misnomer which is not already fully covered in the Standard. Proof of good quality, only two are remaining AFAK: 0709 SYRIAC SUBLINEAR COLON SKEWED RIGHT % SYRIAC SUBLINEAR COLON SKEWED LEFT * marks the end of a real or rhetorical question 0B83 TAMIL SIGN VISARGA = aytham (Confusingly, the Tamil aytham has already an annotation but this says no word about the misnomer: “* just as for the Tamil pulli, the glyph for aytham may use either dots or rings”.) Further about the Tibetan mark quoted above, there is to say that its counterpart is lacking a formal alias (reported on Thu Apr 30, 2015): 0F0A TIBETAN MARK BKA- SHOG YIG MGO * petition honorific, used in Bhutan Consequently, the discussed character entries might IMHO end up as listed below (additions bracketed with underscores). The more a character entry is tainted with defaults, the better it may be commented, as a kind of reparation. Best regards, Marcel Schneider ______________________________________________________ 0F0A TIBETAN MARK BKA- SHOG YIG MGO _% TIBETAN MARK ZOU YIK GUI GO_ _=_ petition honorific * used in Bhutan _by an inferior addressing a superior_ _* name (a misnomer) refers to 0FD0 ("starting flourish for giving a command")_ 0FD0 TIBETAN MARK BSKA- SHOG GI MGO RGYAN % TIBETAN MARK BKA- SHOG GI MGO RGYAN * used in Bhutan _by a superior addressing an inferior_ _* misspelling of "BKA-" in character name is a known defect_ _x (tibetan mark bka- shog yig mgo - 0F0A)_ 0A01 GURMUKHI SIGN ADAK BINDI _% GURMUKHI SIGN ADDAK BINDI_ _* misspelling of "ADDAK" in character name is a known defect_ _x (gurmukhi addak - 0A71)_ 0709 SYRIAC SUBLINEAR COLON SKEWED RIGHT % SYRIAC SUBLINEAR COLON SKEWED LEFT * marks the end of a real or rhetorical question _* name is a misnomer_ 0B83 TAMIL SIGN VISARGA _% TAMIL SIGN_ AYTHAM * just as for the Tamil pulli, the glyph for aytham may use either dots or rings _* character name is a misnomer_ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Mon May 4 07:54:57 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: PRI #297: A new character property for symbols-and-pictographs support
To enhance the support of symbols and pictographs, and in extension of my post on Tue Apr 28, 2015, a supplemental character property might be added. It would express what I call the directionality dynamics, and could therefore be named Directionality Dynamics Property (shortened to “DirDyn-Prop” or the like). This will allow implementers and developers to optimize programmatically the environment and the bidi-mirroring of symbols. For example, the sideways shown engines U+1F680.. may be bidi-mirrored or not, even in left-to-right script, depending on whether they shall face the reader in reading direction (as an expression of obligingness) or follow reading direction (as an expression of dynamics or, as used to see the ancient Greeks, of victory). As it is shown in the Code Charts, this challenge has already been dealt with when the font designer choose to represent U+1F32C WIND BLOWING FACE as “looking” and blowing from left to right, making the left-to-right reader feel at ease by not facing him in reading direction, while the U+1F3A0 CAROUSEL HORSE is “looking” and turning from right to left, thus coming on towards left-to-right readers, a gesture that is fully consistent with its meaning as an invitation to visit the ‘amusement park’ (its alias-verified symbolics). This relationship underscores the importance of fixing the directionality of symbols and pictographs, in order to facilitate the work of font designers and implementers by the means of previsible and customizable settings. As a result, an optimization of the pictographs’ expression would become easily performed at layout and publishing. Moreover, it will be clear once again that these symbols and pictographs *must* become bidi-mirrorable. Best regards, Marcel Schneider
Date/Time: Tue May 5 05:10:30 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: A definition for Formal Alias?
I've got a problem with defining the concept of a Formal Alias. But as I hurry up to send this belated feedback too, I fear to be even less clear. IMHO the definition of what is a Formal Alias is inconsistent inside UCD. In the NamesList, it is another name to name a misnamed character, marked up with an percent sign, output as reference mark U+203B in the Code Charts. This is the way it is referred to in the NamesList syntax page and in the Stability Policy. In NameAliases.txt however, "The formal name aliases are divided into five types", and the above defined ones are just a subset, labelled "correction" or, for U+FEFF BYTE ORDER MARK, "alternate". So my idea is to unify the defines, probably following the pattern already well-known thanks to the Code Charts, where the Control character aliases are not considered as formal aliases, and the Byte Order Mark is considered as having a formal alias name because its historical name Zero width no-break space should become out of use (however, as even in recent systems, the U+2060 is not present, U+FEFF must stay in use as a ZWNBSP nevertheless). Best regards, Marcel Schneider
Date/Time: Wed May 6 08:03:04 CDT 2015
=Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: PRI #297: feedback on XML files
Dear Unicode Editorial Committee, I’m sorry to have avoided opening the XML files (because I prefer text files and the Code Charts in PDF), and I wish to thank Unicode and related persons for having me made aware of. This allows me to update some previous feedback Mr Freytag has kindly reviewed, by trying to feed in a suggestion that would surely (I believe) enhance even more the access to data. The informative aliases, which I mentioned on Sat May 2 07:10:09 CDT 2015, are unfortunately not comprised in the XML files of the UCD, but fortunately the <name-alias/> tag allows to add as many aliases as wished, for which I suggest to create an “information” type label. This powerful markup language allows even to add the annotations (COMMENT_LINEs) as they are provided to the readers of the Code Charts, when a <comment/> tag will be created. Even more, the comments may be given a type each one, like “languages”, “casing”, “compatibility” and so on, allowing to browse and display them by interests and with colors. The block headers, actually seeming not to be a part of the XML files neither, may be made available with appropriate tags by the way. Further, there might be a way to underscore the importance of the formal aliases as denominations representing the greatest common denominator of the various community preferences by highlighting the most suitable name, as Mr Freytag projected to do for the Charts. This swing towards an immediate readability even of formerly misnamed entries should be mirrored in XML by moving the formal alias from a value of the ‘alias’ attribute of the <name-alias> tag towards a value of a future ‘fa’ attribute of the <char> tag, where we find already the NAME (‘na’) attribute and the 1.0 NAME (‘na1’). Adding much more formal aliases, at least up to 337 (please see below for another suggestion) may then be managed thanks to a new ‘correction-level’ attribute, which can take for example the value ‘mis’ when the deal is to correct a misspelling or another inadvertance (as are today’s formal aliases), ‘std’ when eliminating Standardese (CARON, LAMDA, VULGAR etc.), ‘bim’ for more respectfulness towards bidi-mirroring (the ‘open interval’ notation using *reversed* square brackets does not invalidate the principle), and so on. I still believe the discordance about *English* names may be resolved by opting for the most widely used locale, that is American English, because the nouns “slash” and “period” are used in all English speaking countries (but *not*, I agree, the spellings “labor” and “honor”, while “color” is widely used even in England because of its importance in style sheets and so on). Another problem is about file format. Personally I would like a database in text format to paste into a spreadsheet, as can be the NamesList and UnicodeData (please refer to my e-mail to the UTC, which would better have been addressed to the Editorial Committee. I took notice of what has been said on the Mail List, notably about UIs, and wish to thank all persons who were so kind and discussed my e-mails. However, as much information is still missing in UIs, my proposal of what I already christened “UnicodeXData.txt” probably remains interesting because even corrected, the XML files will, I guess, not open in a spreadsheet like a plain text file does. So if it is permitted to post this wish, I would like to find *all* information that is code-point related, in a UnicodeData-shaped file for ready access. More precisely, my suggestion is to add a field that will contain a complete names list representing the smallest common denominator of all user communities, to cater reasonably for the worldwide demand for a complete repertoire of *one* helpful *English* name per character. This field will therefore contain *all* useful formal aliases, whether they appear in the Code Charts, or not (that is, for the sake of graphics, layout and design issues). Further, it will contain the most commonly used alias for each control character. To finish, it will be completed with the identifiers as defined today, which are the default value of the field. That will allow getting readily an accurate name for *each* character. For more usefulness, the next field should contain a type label like the ones defined in NameAliases.txt, or more consistently (please refer to my post of Tue May 5 05:10:30 CDT 2015): N = NAME (default value): the normal character name, used also as a technical identifier; CT = CONTROL: the first listed designation (see NameAliases.txt) of a control character; FA = FORMAL_ALIAS: the designation of a (non-control) character that is not identical with the identifier; AFA = ADDITIONAL_FORMAL_ALIAS: eventually a kind of “additional” formal alias name, which do not appear in the Code Charts but are present in the above mentioned field. The next field will contain the abbreviation of the control character alias name, otherwise it will be empty. And as there are up to three names per control character (U+000A), the next four fields may contain these supplemental names and their abbreviations. (The three unnamed controls U+0080, U+0081, U+0099 may have their “figment” in the same field as the third name of U+000A.) The complete range would look as follows, admitting that this will be the first thing to be added to UnicodeData.txt: Field# Content 15 Designation: Name|Alias|FormalAlias 16 Type: N|CT|FA|AFA 17 Abbreviation 18 Second Alias 19 Second Abbreviation 20 Third Alias 21 Third Abbreviation Other fields may contain the informative aliases provided in the NamesList/CodeCharts and in the XML files, as now suggested above. To complete my formal alias suggestions list sent on Thu Apr 30 06:44:57 CDT 2015, I’m pleased to follow Mr Wordingham’s advice of defining Formal Aliases for the Devanagari Dandas too, and I opted for the addition of “Punctuation”, which is already present with “DANDA” in two languages out of twelve, to enhance the relative universality of these two punctuations and as a mark of respect to compensate the trouble made till now to “Bengali/Tamil etc.” users: 0964 DEVANAGARI DANDA % INDIAN PUNCTUATION DANDA 0965 DEVANAGARI DOUBLE DANDA % INDIAN PUNCTUATION DOUBLE DANDA Best regards, Marcel Schneider
Date/Time: Thu May 7 07:46:59 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+0964/5, U+00DF/1E9E; UCDXML; TUS
Dear Unicode Editorial Committee, Sorry, my suggestion about formal aliases for Danda / double Danda should read of course...: 0964 DEVANAGARI DANDA % INDIC PUNCTUATION DANDA 0965 DEVANAGARI DOUBLE DANDA % INDIC PUNCTUATION DOUBLE DANDA There is further a point I got unfortunately not sooner aware of. It’s about uppercasing of the German ß. Looking at the properties of U+00DF in ucdxml.nounihan.flat.xml, I found that uc="0053 0053" only. In the meantime, German usage begins to shift towards 1E9E, as I already reported and suggested updating the NamesList and Code Charts annotation for this character. IMO there should be an applications Settings checkbox: “☑ ẞ as uppercase for ß”. I don’t know if it’s already implemented. However, since U+1E9E is now a part of most current fonts and is on keyboard thanks to the new German standard layouts, defining uppercase as uc="1E9E" might seem appropriate to avoid loosing the ß in text files. If the custom setting requires uppercasing U+00DF to double U+0053, the cf="0073 0073" value can be used to perform that. To understand the issue, it is necessary to remember that the uppercase latin letter SZ has been created and encoded on behalf of the German Standards body DIN to ensure that personal data are correctly stored and rendered. As in German, the ß is a distinctive part of orthography and is needed in names (if a person’s name is Straßer or STRAẞER, writing STRASSER or STRASZER is false because these are other names, equally borne), not having an uppercase ß made much trouble and lead to some confusion. Today, fortunately this time is past, and the char props may be updated. All what is needed is already in the UCD except the new uppercase as a value of the uc property for U+00DF. Therefore I suggest that Unicode takes advice from the German Standards body (DIN) whether to set this property to its new value. Now as unfortunately I’ve yet another feedback to send, I would suggest to complete the XML files with the xrefs too. While unlike suggested yesterday, the casing *annotations* may be avoided in XML (but *not* the one for U+00DF, nor the other ones like about languages, ancient Standards, font design issues, typographic preferences and much more), the XML format offers the opportunity of enhancing the cross-reference support. A new tagmight be given two properties: char="" whose value is the code point, and rel="" which is new and gives way to explicite the relationship between the character and the xref. This information is a real need, and its lack is actually very annoying. I can suggest already a few values as: "case" for casing-related xrefs; "resemble" for characters resembling by their appearance; "origin" when the cited character was really used to create the commented one; "ornamental" if the matter is to suggest some nice alternate chars; "usage" for characters with similar or opposite usage; "family" to refer to other related characters (for example, for consistency of font design) There will be surely much more. I believe that more, easy-to-search and readily available information can notably enhance user experience, whether directly or by the means of better and richer implementations and user interfaces. In that sense, I suggest also to create, if feasable, an online vesion of the Standard (I will say, the chapters of TUS which are actually available in PDF). Among the advantages on user side, one might quote the following: — better browsing and searchability of the whole text — better referencing when “Purple Numbers” are implemented — easy to show-and-hide fine-levelled numbering — interactive table of contents with quick access to items — easy-to-quote text by simplier copy’n’paste — formatting settings to enhance accessibility But that does not mean I would prefer an online version to the PDF. To improve quotability, I would suggest to typeset the character names (which actually are in small caps) in uppercase throughout, and to apply rather a reduced font size like specified in the style sheet of UAX #9 (where, however, redundant formatting leads to lowercase and small-cap the uppercase source text at the same time (“span.name { text-transform: lowercase; font-variant: small-caps; font-size: 75%; }”). The result was not convincing as it appeared in UAX #9, section 3.2. I’m still believing that there is a way to get a Standard for everyone’s use, the only condition being to read English. This allows everybody to access, as I faultily spelled, the full bandwidth of the Unicode Standard in real time. “Real time” being in my sense the fact that no translation is needed in English, because the original version is fully understandable even for people who have just learned some English. Best regards, Marcel Schneider
Date/Time: Mon May 11 07:33:52 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Various corrigenda to my feedback
Dear Unicode Editorial Committee, as I learned you will discuss some details after the UTC meeting, I hope this feedback which I shall send you as my last one (after several others meant to be the last), will reach you in time, because I found a few points to correct. *** GRANTHA OM The GRANTHA OM, U+11350, is named consistently with the DEVANAGARI OM U+0950, the GUJARATI OM U+0AD0, the TAMIL OM U+0BD0 and several other OM signs. Thus, my related feedback is out of purpose. *** Character names in accordance with Bidi-mirroring support The OPENING and CLOSING epithets for brackets are verified as accurate in all discussed contexts. A problem has been raised about the mathematical notation of an open interval: ]a, b[. I failed when Wed May 6, I supposed these were “reversed” brackets. IMAO the problem is resolved when considering the first bracket as closing the precedent interval, which extends from minus infinite to a, including a, and the second bracket, as opening the following interval, which completes from b (included) to +∞. This interpretation is highlighted when considering the notation of the two possible half-open intervals [a, b[ and ]a, b]: Every time, the bracket is CLOSING when closing an interval in numbering/reading direction, and OPENING when closing it in the other direction — a fact that BTW should lead to call [a, b[ “half-open upwards” instead of “half-open to the right”, when applying once again the *universalization strategy* that should be an _unremovable_ part of the framework when standardizing a Universal Character Set. As far as it applies to character names, this universalization strategy, after having been introduced by Unicode, has been removed by ISO. *** Full stop If PERIOD is not a suitable name for U+002E FULL STOP, one might consider giving it the frequently used DOT or POINT alias. However, in TUS, it is mostly referred to as a “period”, and this could IMHO be reason enough to prefer it, since unifying the way an object is called, makes the discourse better understandable. And that should be among the main goals of the Unicode Standard, rather than increasing the need of time-and-money-intensive translations, which probably brings a risk for many languages (except French) of remaining undone. *** NamesList syntax and Code Charts layout I’m sorry to have suggested (on Mon Apr 27 01:13:55 CDT 2015) to put the NamesList syntax upwards down by raising the “%” markup to the Names line when the char entry includes a Formal Alias, and I prefer recalling the other suggestion I sent on Mon Apr 20 03:09:02 CDT 2015: “Even simplier, the roles of CharacterName and FormalAlias may be inverted at these instances, giving the Formal Alias a Code Name status (and the Character Name a True Designation status).” I’m glad Mr Freytag agrees on the principle. Here like often I’ve not been clear: In the Unicode Charts, the name following the code point should always be a helpful one, an alias if necessary, and in that case, the old identifier can be given as is today the Formal Alias, with an annotation like “this alias is the (misspelled / mistaken / semantically obsolete / standardese, but) stable identifier”. That would replace part of the actual “character name is a misnomer”, “misspelling of [...] is a known defect” and “despite its name [...]” annotations. For whole series of misnamed or ugly named characters as there are: 3021 HANGZHOU NUMERAL ONE sqq, 00BC VULGAR FRACTION ONE QUARTER sqq, 2150 VULGAR FRACTION ONE SEVENTH sqq, 010C LATIN CAPITAL LETTER C WITH CARON sqq, 1D6B2 MATHEMATICAL BOLD CAPITAL LAMDA sqq, it could be sufficient to indicate the first former name (the identifier) of the series and explain in an annotation that the others must be extrapolated conformingly, in order to avoid overloading the listings in the Code Charts. *** Font-size of figures in the Code Charts (new subject) By contrast with the figures of the code points in the Code point column, the figures of the code points in the next column are inconsistent in font size. Decimal digits are slightly smaller than hex letters. In most zoom factors this results in a difference of one or two pixels, but this may be notable even at 100 %. This is likely to need a comment about the dislike of figures in today’s documents. Many things, including human language, are converted to figures (character encoding is one example). Meanwhile, a kind of what might be called a figure allergy seems to have set in, which leads to hide figures and digits wherever feasible (and is a part of the following concern). *** Code-points in the Bookmarks side-pane At the opposite of what I suggested on Wed Apr 1 09:43:56 CDT 2015 about adding code points to the PDF bookmarks for display in the side pane, I understand today that would overload this reduced space and make the bookmarks less attractive. When language names and figures are put together, the result could lead to associations with a ranking, and raise idle questions like “Why has this language been encoded before that other one”. There is however a need of browsing the Code Charts by code points, because code point searching tools are not always suitable. I suggest therefore adding a bookmark called “Code Point Ranges” at the end, which would not automatically expand. Once this bookmark expanded, there would be the list of blocks in another format with just range start and range end, matching exactly the blockhead list above as about the targets, but displayed apart, avoiding thus the problems related to figures. To complete, the blockheads should then probably be grouped together in a general bookmark like “Blocks”, which shall expand by default to produce the actual display. The advantage would be that rather than looking up the block by range in Blocks.txt and then searching the side pane for the blockhead, the Code Charts reader could search for the range in the side pane and get the Chart displayed by clicking the range’s bookmark. This enhancement would need that the Portable Document format allows multiple bookmark sets. In other words, a given target could have several bookmarks in different areas of the bookmarks list. Another requirement is that the bookmarks can be flagged to expand or not to expand by default. The expand-by-default behavior would be nice in TUS too, to prevent the side-pane from displaying only the chapter head bookmark even when the last settings were saved and the setting was to expand the current bookmark and/or to display the main bookmarks. This would be even more useful in an all-in-one PDF of the Standard (which I didn’t find yet) because the chapters’ list would display by default from opening on as a kind of “obligingness” towards the reader. IMHO it is very helpful to look up characters in the Code Charts and to browse the Charts in PDF, especially the very ergonomical all-in-one. To compare glyphs in different blocks, several copies may be opened in as many instances of Adobe Reader. Best regards, Marcel Schneider ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Date/Time: Mon May 11 10:15:06 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestions for 8.0.0: U+0964/5, U+00DF/1E9E, XML, TUS
Dear Unicode Editorial Committee, Unfortunately the beta period is over, but not my feedback. I'm trying therefore to send this and another post with other subjects, for the case that notice can still be taken. Sorry, my suggestion about formal aliases for Danda / double Danda should read of course...: 0964 DEVANAGARI DANDA % INDIC PUNCTUATION DANDA 0965 DEVANAGARI DOUBLE DANDA % INDIC PUNCTUATION DOUBLE DANDA There is further a point I got unfortunately not sooner aware of. It’s about uppercasing of the German ß. Looking at the properties of U+00DF in ucdxml.nounihan.flat.xml, I found that uc="0053 0053" only. In the meantime, German usage begins to shift towards 1E9E, as I already reported and suggested updating the NamesList and Code Charts annotation for this character. IMO there should be an applications Settings checkbox: “☑ ẞ as uppercase of ß”. I don’t know if it’s already implemented. However, since U+1E9E is now a part of most current fonts and is on keyboard thanks to the new German standard layouts, defining uppercase as uc="1E9E" might seem appropriate to avoid loosing the ß in text files. If the custom setting requires uppercasing U+00DF to double U+0053, the cf="0073 0073" value can be used to perform that. To understand the issue, it is necessary to remember that the uppercase latin letter SZ has been created and encoded on behalf of the German Standards body DIN to ensure that personal data are correctly rendered. As in German, the ß is a distinctive part of orthography and is needed in names (if a person’s name is Straßer or STRAẞER, writing STRASSER or STRASZER is false because these are other names, equally borne), not having an uppercase ß made much trouble and lead to some confusion. Today, fortunately this time is past, and the char the char props may be updated. All what is needed is already in the UCD except the new uppercase as a value of the uc property for U+00DF. Therefore I suggest that Unicode takes advice from the German Standards body (DIN) whether to set this property to its new value. Now as unfortunately I’ve yet another feedback to send, I would suggest to complete the XML files with the xrefs too. While unlike suggested yesterday, the casing *annotations* may be avoided in XML (but *not* the one for U+00DF, nor the other ones like about languages, old Standards, font design issues, typographic preferences and much more), the XML format offers the opportunity of enhancing the cross-reference support. A new tag <xref/> might be given two properties: char="" whose value is the code point, and rel="" which is new and gives way to explicite the relationship between the character and the xref. This information is a real need, and its lack is actually very annoying. I can suggest already a few values as: "case" for casing-related xrefs; "resemble" for characters resembling by their appearance; "origin" when the cited character was used to create the commented one; "ornamental" if the matter is to suggest some nice alternate chars; "usage" for characters with similar or opposite usage; "family" to refer to other related characters (for example, for consistency of font design) I believe that more, easy-to-search and readily available information can notably enhance user experience, whether directly or by the means of better and richer implementations and user interfaces. In that sense, I suggest also to create, if feasable, an online vesion of the Standard (I will say, the chapters of TUS which are actually available in PDF). Among the advantages on user side, one might quote the following: — better browsing and searchability of the whole text — better referencing when “Purple Numbers” are implemented — easy to show-and-hide fine-levelled numbering — interactive table of contents with quick access to items — easy-to-quote and paste text — formatting settings to enhance accessibility But that does not mean I would prefer an online version to the PDF. To improve quotability, I would suggest to typeset the character names (which actually are in small caps) in uppercase throughout, and to apply rather a reduced font size like specified in the style sheet of UAX #9 (where, however, redundant formatting leads to lowercase and small-cap the uppercase source text at the same time (“span.name { text-transform: lowercase; font-variant: small-caps; font-size: 75%; }”). The result was not convincing as it appeared in UAX #9, section 3.2. I’m still believing that there is a way to get a Standard for everyone’s use, the only condition being to read English. This allows everybody to access (as I misspelled on the Mail List) the full bandwidth of the Unicode information in real time. Real time is here in my sense the fact that the original documentation is directly usable for almost everybody. Best regards, Marcel Schneider
Date/Time: Mon May 11 10:17:07 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestions for 8.0.0: PDF, and some corrigenda
Dear Unicode Editorial Committee, as I learned you will discuss some details after the UTC meeting, I hope this feedback which I shall send you as my last one (after several others meant to be the last), will reach you in time, because I found a few points to correct. *** GRANTHA OM The GRANTHA OM, U+11350, is named consistently with the DEVANAGARI OM U+0950, the GUJARATI OM U+0AD0, the TAMIL OM U+0BD0 and several other OM signs. Thus, my related feedback is out of purpose. *** Character names in accordance with Bidi-mirroring support The OPENING and CLOSING epithets for brackets are verified as accurate in all discussed contexts. A problem has been raised about the mathematical notation of an open interval: ]a, b[. I failed when Wed May 6, I supposed these were “reversed” brackets. IMAO the problem is resolved when considering the first bracket as closing the precedent interval, which extends from minus infinite to a, including a, and the second bracket, as opening the following interval, which completes from b (included) to +∞. This interpretation is highlighted when considering the notation of the two possible half-open intervals [a, b[ and ]a, b]: Every time, the bracket is CLOSING when closing an interval in numbering/reading direction, and OPENING when closing it in the other direction — a fact that BTW should lead to call [a, b[ “half-open upwards” instead of “half-open to the right”, when applying once again the *universalization strategy* that should be an _unremovable_ part of the framework when standardizing a Universal Character Set. As far as it applies to character names, this universalization strategy, after having been introduced by Unicode, has been removed by ISO. *** Full stop If PERIOD is not a suitable name for U+002E FULL STOP, one might consider giving it the frequently used DOT or POINT alias. However, in TUS, it is mostly referred to as a “period”, and this could IMHO be reason enough to prefer it, since unifying the way an object is called, makes the discourse better understandable. And that should be among the main goals of the Unicode Standard, rather than increasing the need of time-and-money-intensive translations, which probably brings a risk for many languages (except French) of remaining undone. *** NamesList syntax and Code Charts layout I’m sorry to have suggested (on Mon Apr 27 01:13:55 CDT 2015) to put the NamesList syntax upwards down by raising the “%” markup to the Names line when the char entry includes a Formal Alias, and I prefer recalling the other suggestion I sent on Mon Apr 20 03:09:02 CDT 2015: “Even simplier, the roles of CharacterName and FormalAlias may be inverted at these instances, giving the Formal Alias a Code Name status (and the Character Name a True Designation status).” I’m glad Mr Freytag agrees on the principle. Here like often I’ve not been clear: In the Unicode Charts, the name following the code point should always be a helpful one, an alias if necessary, and in that case, the old identifier can be given as is today the Formal Alias, with an annotation like “this alias is the (misspelled / mistaken / semantically obsolete / standardese, but) stable identifier”. That would replace part of the actual “character name is a misnomer”, “misspelling of [...] is a known defect” and “despite its name [...]” annotations. For whole series of misnamed or ugly named characters as there are: 3021 HANGZHOU NUMERAL ONE sqq, 00BC VULGAR FRACTION ONE QUARTER sqq, 2150 VULGAR FRACTION ONE SEVENTH sqq, 010C LATIN CAPITAL LETTER C WITH CARON sqq, 1D6B2 MATHEMATICAL BOLD CAPITAL LAMDA sqq, it could be sufficient to indicate the first former name (the identifier) of the series and explain in an annotation that the others must be extrapolated conformingly, in order to avoid overloading the listings in the Code Charts. *** Font-size of figures in the Code Charts (new subject) By contrast with the figures of the code points in the Code point column, the figures of the code points in the next column are inconsistent in font size. Decimal digits are slightly smaller than hex letters. In most zoom factors this results in a difference of one or two pixels, but this may be notable even at 100 %. This is likely to need a comment about the dislike of figures in today’s documents. Many things, including human language, are converted to figures (character encoding is one example). Meanwhile, a kind of what might be called a figure allergy seems to have set in, which leads to hide figures and digits wherever feasible (and is a part of the following concern). *** Code-points in the Bookmarks side-pane At the opposite of what I suggested on Wed Apr 1 09:43:56 CDT 2015 about adding code points to the PDF bookmarks for display in the side pane, I understand today that would overload this reduced space and make the bookmarks less attractive. When language names and figures are put together, the result could lead to associations with a ranking, and raise idle questions like “Why has this language been encoded before that other one”. There is however a need of browsing the Code Charts by code points, because code point searching tools are not always suitable. I suggest therefore adding a bookmark called “Code Point Ranges” at the end, which would not automatically expand. Once this bookmark expanded, there would be the list of blocks in another format with just range start and range end, matching exactly the blockhead list above as about the targets, but displayed apart, avoiding thus the problems related to figures. To complete, the blockheads should then probably be grouped together in a general bookmark like “Blocks”, which shall expand by default to produce the actual display. The advantage would be that rather than looking up the block by range in Blocks.txt and then searching the side pane for the blockhead, the Code Charts reader could search for the range in the side pane and get the Chart displayed by clicking the range’s bookmark. This enhancement would need that the Portable Document format allows multiple bookmark sets. In other words, a given target could have several bookmarks in different areas of the bookmarks list. Another requirement is that the bookmarks can be flagged to expand or not to expand by default. The expand-by-default behavior would be nice in TUS too, to prevent the side-pane from displaying only the chapter head bookmark even when the last settings were saved and the setting was to expand the current bookmark and/or to display the main bookmarks. This would be even more useful in an all-in-one PDF of the Standard (which I didn’t find yet) because the chapters’ list would display by default from opening on as a kind of “obligingness” towards the reader. IMHO it is very helpful to look up characters in the Code Charts and to browse the Charts in PDF, especially the very ergonomical all-in-one. To compare glyphs in different blocks, several copies may be opened in as many instances of Adobe Reader. Best regards, Marcel Schneider
Date/Time: Mon May 4 12:36:00 CST 2015
Name: Asmus Freytag
Report Type: Error Report
Opt Subject: Review of Marcel Schneider's feedback on Unicode 8.0 beta
Feedback on "idle" @+ notices. The nameslist is not intended for machine processing, other than by the code chart layout tool. In that context the use of @+ is motivated and not idle. Proposed resolution of feedback: not accepted. --- Feedback on bookmarks There are other tools for searching characters by code point value. Proposed resolution of feedback: not accepted --- Feedback on numbering levels in the standard While fewer levels make for a more attractive book design, the difficulties in citing material in the standard are real. Proposed resolution of feedback: forward to ed committee --- Feedback on showing mirrored glyphs There are cases where the mirrored forms are not perfect mirror images (cube root 221B, for example, where the "3" does not mirror). In such cases, the mirrored shape might perhaps be documented as "alternate glyph". In cases of unusual mirroring behavior an annotation like that for FD3E and FD3F should be sufficient. One might consider adding a note on mirroring behavior at the block level for arrows and mathematical symbols, pointing out that arrows are not mirrored. Proposed resolution of feedback: forward to ed committee --- Feedback on extended datafile The purpose of that is served by the XML version of the UCD. Proposed resolution of feedback: not accepted --- Feedback on character naming policy and stability (various) There is little to be gained by abandoning the existing interpretation of character names as identifiers or to abandon the corresponding stability policy. The negatives on the other hand are huge. Proposed resolution of feedback: not acceptable --- Feedback on details of text in the standard and annotations to the nameslist (various) These are too detailed to review in the full UTC, but some appear useful. Proposed resolution of feedback: forward to ed committee --- Feedback on CODENAME (mentioned in two sections) In general, adding more syntax to the nameslist would make it unnecessarily complex. However, it might be useful to invert the display of name aliases, esp. where the original name is a misnomer or typo. I other words, the ed comm might look into whether it's feasible to show the most suitable "alias" in the formal "NAME" line of the code charts (and the original name as annotation using the syntax for alias). For the nameslist, tracking which is which is perhaps less important than to de-emphasize mistakes. Rather than moving the % annotation (which would break some deeply embedded assumptions in the code for the layout processor) the idea would be to redefine what the contents of the nameslist are. The % annotation would then no longer indicate "the" formal alias for a character name, but treat both names and alias as "aliases" throughout, with the first listed one becoming the "preferred" alias (instead of, as before, the UnicodeData "name"). This would be most useful in cases of actual "corrections". It's at least useful enough to have the ed committee have a look at this. For people needing to track which alias has what status (original, vs. formal alias) the data files give that answer. Proposed resolution of feedback: forward to ed committee --- Feedback on using aliases to "improve" character names (various) This is ultimately a losing proposition. Many characters are used in multiple ways, or have diverse names in different user communities. Attempting to improve even those names where a consensus alternative could be found would only result in raising the expectations on character names and make the intractable cases stand out even more. Even OPENING and CLOSING cannot be assigned uniquely (cf. mathematical use like]a,b[ ). Proposed resolution of feedback: not accepted