Accumulated Feedback on PRI #297

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Wed Feb 25 16:37:12 CST 2015
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Indic Syllabic Categories

I've reviewed the application of the revised categories as set forth in
L2/14-126 (http://www.unicode.org/L2/L2014/14126r-indic-properties.pdf)
as applied to the Thai, Lao and Tai Tham scripts, and noted a few other
characters, and come up with the following proposed changes of syllabic
category. I have also taken into account the proposals of Roozbeh
Pournader of 24 February 2015 related to work on the Universal Shaping
Engine.

I've come up with 3 new characters of category Bindu: 
0303 ;Bindu # Mn COMBINING TILDE 
0310 ; Bindu # Mn COMBINING CANDRABINDU
1A74 ; Bindu # Mn TAI THAM SIGN MAI KANG (currently Vowel_Dependent)

Note that both U+0ECD LAO NIGGAHITA and U+1A74 function both as Bindu
and as Vowel_Dependent. U+0303 is used in Patani Malay in the Thai
script - see UTC document L2/10-451. U+0310 is used for Sanskrit in
Tamil script, according to Indic list email 'Re: Tamil Punctuation',
27/7/12 9:24 +0530 from Shriramana Sharma.

I've found 4 new characters of category Visarga: 
0E30 ; Visarga # Lo THAI CHARACTER SARA A 
0EB0 ; Visarga # Lo LAO VOWEL SIGN A 
1A61 ; Visarga # Mc TAI THAM VOWEL SIGN A 
19B0 ; Visarga # Mc (to be Lo) NEW TAI LUE VOWEL SIGN VOWEL SHORTENER

Note that the tone (or voice modulation) character U+1038 MYANMAR SIGN
VISARGA is currently classified as Visarga. U+0E30 is used as visarga
in Sanskrit, e.g. in the Royal Institute Dictionary. The typical sound
of the four visargas above is /ʔ/ rather than /h/, and, through a
feature of Tai (SW Tai?) phonology, they all have the additional
function of shortening a vowel. As a vowel shortener, U+1A61 and U+19B0
may follow a final consonant.

These 4 characters are currently classified as Vowel_Dependent. Except
for the Lao script, that usage can easily be interpreted as a
modification of the implicit vowel. Modern Lao does not acknowledge the
existence of an implicit vowel, so that interpretation may be harder to
accept. (Vowel_Dependent U+0EB1 LAO VOWEL SIGN MAI KAN is also a vowel
shortener; in the 19th century it was denied that Vowel_Dependent
U+0E31 THAI CHARACTER MAI HAN-AKAT was a vowel in Thai.)

U+1A61 occasionally has the sound /k/, especially when used in
conjunction with U+1A62 TAI THAM VOWEL SIGN MAI SAT. I think we should
regard this as just one of the uses of visarga.

I've found 3 new nuktas, at least, so long as the application of nukta
is not restricted to *foreign* consonants. 

0331 ; Nukta # Mn COMBINING MACRON BELOW 
0359 ; Nukta # Mn COMBINING ASTERISK BELOW 
1A7F ; Nukta # Mn TAI THAM COMBINING CRYPTOGRAMMIC DOT

U+0331 is used in Patani Malay in the Thai script - see L2/10-451 and
the consonant chart on p16 of
http://mlenetwork.org/sites/default/files/Patani%20Malay%20Presentation%20-%20Part%202.pdf.  
U+0331 and U+0359 have been used in English-Thai dictionaries to
represent English sounds, very much a nukta role. They were previously
classified as 'Other', though there is a proposal to make U+1A7F
'Syllable_Modifier'. U+0EC8 LAO TONE MAI EK functions as Nukta in Khmu
as well as performing its principal rôle of Tone_Mark in Lao. U+0E3A
THAI CHARACTER PHINTHU is used both as Nukta and as Pure_Killer; the
latter is its traditional rôle.

I've found 4 new pure killers, all
currently classified as 'Other', though there is a proposal to classify
U+0E4C (along with U+17CD) as 'Consonant_Killer'.  They are: 

0E4C ;Pure_Killer # Mn THAI CHARACTER THANTHAKHAT 
0ECC ; Pure_Killer # Mn LAO CANCELLATION MARK 
1A7C ; Pure_Killer # Mn TAI THAM SIGN KHUEN-LUE KARAN
1A7A ; Pure_Killer # Mn TAI THAM SIGN RA HAAM 

U+0E4C THAI CHARACTER THANTHAKHAT and U+0E4E THAI CHARACTER YAMAKKAN
once divided the role of vowel killing - U+0E4E formed clusters and
U+0E4C removed final vowels. The use of U+0E4C came to be largely
restricted to vowels associated with clusters of consonants. Removing
the vowel made the final consonant of the cluster silent (spoken Thai
does not permit final consonant clusters), and from this effect it has
been reinterpreted as a consonant-killer. U+0ECC probably had the same
behaviour as U+0E4C. I don't know if it is still used in Laos - foreign
loanwords often don't follow the rules.

The Tai Tham marks are still at the transitional stage - they are
sometimes found on final unsubscripted consonants to indicate that they
have no vowel. There is an unfortunate overlap with the final consonant
mark for >
<r>
 (pronunciation necessarily /n/). The Khuen and Lue from of
the final consonant symbol has the same shape as the Thai and Lao form
of the pure killer. Consequently U+1A7A serves as Consonant_Final in
Tai Khuen and Tai Lue. In Tai Khuen, at least, the use as a final
consonant seems to have recently fallen into disfavour, so it seems
most appropriate to classify U+1A7A as 'Pure_Killer'. I noted above
that the 'Pure_Killer' U+0E3A THAI CHARACTER PHINTHU also serves as a
nukta. I have a vague recollection that U+0E4C THAI CHARACTER
THANTHAKHAT serves as a register mark in an orthography for the Chong
language, so that would count as an auxiliary rôle as Tone_Mark.

If 'Consonant_Killer' is to be separated from 'Pure_Killer', then we
need a separate category 'Dual_Mode_Killer' for U+1A7A and U+1A7C.

It should be noted that U+1A62 TAI THAM VOWEL SIGN MAI SAT serves not
only as Vowel_Dependent but also as Consonant_Final. This seems to be
chiefly relevant to anyone attempting to deduce the pronunciation from
the spelling.

There are 4 characters currently categorised as 'Consonant' which I
think are better categorised as 'Vowel': 

0E24 ; Vowel # Lo THAI CHARACTER RU 
0E26 ; Vowel # Lo THAI CHARACTER LU 
1A42 ; Vowel # Lo TAI THAM LETTER RUE 
1A44 ; Vowel # Lo TAI THAM LETTER LUE

They serve both as independent and dependent vowels. Note that U+0E24 and U+0E26 may be
followed by the length mark U+0E45 THAI CHARACTER LAKKHANGYAO, which is
categorised as 'Vowel_Dependent'. I am not aware of any usage of U+0E45
as a true vowel.

The sequence >
<U+1AAD TAI THAM SIGN CAANG, U+1A63 TAI THAM VOWEL SIGN
AA>
 occurs with the same meaning, 'elephant', as U+1AAD. I don't know
AA>
 whether this justifies changing U+1AAD from 'Other' to 'Consonant_Placeholder'.

I've found one new Consonant:

0EBD ; Consonant # Lo LAO SEMIVOWEL SIGN NYO (was Consonant_Medial)
0EDE ; Consonant # Lo LAO LETTER KHMU GO (was Other)

U+0EBD is used as an initial consonant in Khmu, so U+0EBD has been used
in all rôles in the Lao script, like U+0EA7 LAO LETTER WO, which is of
category Consonant. For information on Khmu usage, see UTC document
L2/10-335 (http://www.unicode.org/L2/L2010/10335r-n3893r-lao-hosken.pdf). The
Khmu alphabet chart included backs up the text. (It also shows U+0EC8
LAO TONE MAI EK acting as a Nukta!)

If 'repha' can be used as a general category, including for example
Myanmar script kinzi, then there are two arguable new examples,
currently categorised as Consonant_Final:

1A58 ; Consonant_Preceding_Repha? # Mn TAI THAM SIGN MAI KANG LAI
1A5A ; Consonant_Succeeding_Repha? # Mn TAI THAM CONSONANT SIGN LOW PA

There are significant issues with U+1A58; while traditionally it
behaves as repha/kinzi, some modern styles are better served by
treating it as Consonant_Final. It takes some juggling for a single
OTL-style rendering engine to be able to render either style depending
on the lookups while oblivious to the difference, but it can be done.

I've found 5 new instances of Consonant_Subjoined: 
1A57 ; Consonant_Subjoined # Mc TAI THAM CONSONANT SIGN LA TANG LAI 
1A5B ; Consonant_Subjoined # Mn TAI THAM CONSONANT SIGN HIGH RATHA OR
LOW PA
1A5C ; Consonant_Subjoined # Mn TAI THAM CONSONANT SIGN MA 
1A5D ; Consonant_Subjoined # Mn TAI THAM TAI THAM CONSONANT SIGN BA 
1A5E ; Consonant_Subjoined # Mn TAI THAM CONSONANT SIGN SA

They were all previously categorised as Consonant_Final.

Note that U+1A57 is an abbreviation. It is derived by the addition of a
stroke to the subscript form >
<U+1A60 TAI THAM SIGN SAKOT, U+1A43 TAI
THAM LETTER LA>
. Abbreviations of the word _tanglaai_ 'all' using U+1A57
normally include at least >
<U+1A57, U+1A63 TAI THAM VOWEL SIGN AA>
, so
U+1A57 is not Consonant_Final.  An example, apparently spelt >
<U+1A26
TAI THAM LETTER NGA, U+1A57, U+1A76 TAI THAM SIGN TONE-2, U+1A63 TAI
THAM VOWEL SIGN AA>
, is given in Table 16 at
http://www.seasite.niu.edu/tai/TaiLue/graphic%20blends.htm.

The word ᨶᩥᨻᩛᩤᨶ >
<U+1A36 TAI THAM LETTER NA, 1A65 TAI THAM VOWEL SIGN I, 1A3B
TAI THAM LETTER LOW PA, 1A5B, 1A64 TAI THAM VOWEL SIGN TALL AA, 1A36>

_nippa:na_ 'nirvana' immediately demonstrates that U+1A5B is not a
final consonant. U+1A5C occurs in Pali proper names ending -mmo >
<U+1A3E
TAI THAM LETTER MA, U+1A5C, U+1A6E TAI THAM VOWEL SIGN E, U+1A63 TAI
THAM VOWEL SIGN AA>
, so is clearly not a final consonant.

U+1A5D occurs in Northern Thai principally in one word, whose
pronunciation is roughly /kɔbɔː/.  U+1A5D is not Consonant_Final in its
phonetic effect. The word is a compound word (or perhaps just a visual
compound), formed by chaining two syllables and striking out
the duplicated characters. I have a text in which the constituents are
to be encoded >
<U+1A20 TAI THAM LETTER HIGH KA, U+1A74 TAI THAM SIGN MAI
KANG>
 and >
<U+1A37 TAI THAM LETTER BA, U+1A74, U+1A75 TAI THAM SIGN
KANG>
 TONE-1>
, so the chained word may reasonably be encoded >
<U+1A20,
KANG>
 U+1A74, U+1A5D, U+1A75>
 or >
<U+1A20, U+1A5D, U+1A74, U+1A75>
.

While all my examples of U+1A5E are word final, it seems to differ from
>
<U+1A60, U+1A48 TAI THAM LETTER HIGH SA>
 on the basis of the room
available for it. Both forms are used as a word final consonant. The
only Pali consonant cluster ending in /s/ is /ss/, and that is written
using U+1A54 TAI THAM LETTER GREAT SA, so a non-final >
<s>
 will be rare.
(I'm finding /ks/ written with U+1A47 TAI THAM LETTER HIGH SSA due to
the application of RUKI.) However, I feel it would be rash to presume
that every example of U+1A5E will be a final consonant.

I have one new Consonant_Final:

0EDF ; Consonant_Final # Lo LAO LETTER KHMU NYO (was Consonant)

See UTC document L2/10-335 for evidence.

I have one possible new Consonant_subjoined:

1A7B ; Consonant_subjoined # Mn TAI THAM SIGN MAI SAM 

The value of its Indic_Matra_Category, if relevant, should be recorded
as Top. U+1A7B is principally a repetition mark, indicating the
repetition of a word.  As extensions of this role, it can also do at
least the following:

(1) Indicate a repeated (not geminate) consonant
(2) Indicate an omitted implicit vowel (one omits an implicit vowel by
replacing it with U+1A60)
(3) Indicate an epenthetic vowel (extension
of Role 2).

In rôle (1), it serves as a subjoined consonant.  In rôles
(2) and (3), it serves as a dependent vowel.  For a shaper that does
not constrain appearance, such as the Universal Shaping Engine, the
best categorisation  is probably 'Consonant_subjoined'.

Although U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and U+1A56 TAI THAM
CONSONANT SIGN MEDIAL LA are named as medial consonants, too much
should not be read into such a description.  Both are, very
occasionally, immediately preceded by vowels, and both may be followed
by >
<U+1A60 TAI THAM SIGN SAKOT,  U+1A40 TAI THAM LETTER HIGH YA>
 and
>
<U+1A60, U+1A45 TAI THAM LETTER WA>
.  While the latter two sequences
most commonly represent vowels, the strictly consonantal cluster
>
<U+1A49 TAI THAM LETTER HIGH HA, U+1A56, U+1A60, U+1A45>
 starts a few
words beginning with the cluster /lw/.  This is a behaviour the
Universal Shaping Engine of Microsoft currently disallows for medial
consonants.

We should therefore have: 
1A55 ; Consonant_Subjoined #Mc       TAI THAM CONSONANT SIGN MEDIAL RA 
1A56 ; Consonant_Subjoined #Mn       TAI THAM CONSONANT SIGN MEDIAL LA

I actually see no benefits for rendering engines in distinguishing Consonant_Medial and
Consonant_Subjoined, though the contrast may help in locating phonetic syllable boundaries.

Date/Time: Tue Mar 10 21:55:16 CDT 2015
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #297 typo

NOTE: This was addressed by the editor on March 11, 2015 and will be in the next available draft.

The names list note for U+1DA8B SIGNWRITING PARENTHESIS mentions U+1DAA5 SIGNWRITING 
ROTATION MODIFIER-5. However, U+1DAA5 is SIGNWRITING ROTATION MODIFIER-6.

Date/Time: Fri Mar 13 12:57:27 CDT 2015
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #297: No! No! No!

The comments in IndicSyllabicCategory-8.0.0d3.txt claim that the 
general category of {SUPER,SUB}SCRIPT {TWO,THREE,FOUR} is Mn, but it is No.

Date/Time: Sat Mar 28 14:53:23 CDT 2015
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: Missing entries in 8.0 PropertyValueAliases.txt

Note: This issue was fixed in the data on March 12:

Indic_Syllabic_Category is missing these two values:
Consonant_With_Stacker and Consonant_Prefixed

Date/Time: Tue Mar 31 16:14:37 CDT 2015
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Unicode 8.00 Beta - InSC

While the current candiadates for category Consonant_Succeeding_Repha may
descend from repha, only one of them, U+17CC KHMER SIGN ROBAT is clearly still
a repha.

Reading the script descriptions in TUS makes it abundantly clear that U+1B03
BALINESE SIGN SURANG and U+A982 JAVANESE SIGN LAYAR are actually final
consonants.  The TUS also states that U+1B81 SUNDANESE SIGN PANGLAYAR is a
final consonant, but without going into any details.

Date/Time: Thu Apr 9 00:09:31 CDT 2015
Name: R.S. Wihananto
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297

Indic Syllabic and Positional Category of U+A9BD JAVANESE CONSONANT SIGN KERET

In Indic Syllabic Category data, U+A9BD JAVANESE CONSONANT SIGN KERET is
categorized as 'Consonant_Subjoined'. This is incorrect. U+A9BD is not a
subjoined form of any Javanese consonant. Historically, U+A9BD is a dependent
vowel of vocalic r. Its counterpart in Balinese script is U+1B3A BALINESE
VOWEL SIGN RA REPA; and U+1B3A is categorized as 'Vowel_Dependent'. In modern
Javanese, U+A9BD is used as replacement for U+A9BF JAVANESE CONSONANT SIGN
CAKRA (medial ra) if followed by U+A9BC JAVANESE VOWEL SIGN PEPET (vowel sign
ĕ). So in modern Javanese U+A9BD is treated like a medial consonant sign. In
books teaching about Javanese script, the three characters are always grouped
together: medial ya (U+A9BE JAVANESE CONSONANT SIGN PENGKAL), medial ra
(U+A9BF JAVANESE CONSONANT SIGN CAKRA), and medial rĕ (U+A9BD JAVANESE
CONSONANT SIGN KERET). So U+A9BD in Indic Syllabic Category should be
recategorized as 'Consonant_Medial' like U+A9BE and U+A9BF. However, unlike
U+A9BE and U+A9BF, U+A9BD can't be followed by vowel signs because it already
have inherent ĕ vowel.

Also the Unicode Character Categories of this U+A9BD character is incorrect.
It should not be categorized as 'Mc' (Mark, Spacing Combining), but 'Mn'
(Mark, Nonspacing). This character is nonspacing and its behavior in combining
with other character and forming ligature is similar to nonspacing vowel sign
u (U+A9B8) and uu (U+A9B9). Its Balinese counterpart U+1B3A also has 'Mn'
character category. So the Indic Positional Category of this character should
also be corrected from 'Right' to 'Bottom'.

Date/Time: Thu Apr 9 00:22:32 CDT 2015
Name: R.S. Wihananto
Report Type: Error Report
Opt Subject: Public Review Issue #297

Positional Category of U+A9BE JAVANESE CONSONANT SIGN PENGKAL and U+A9BF
JAVANESE CONSONANT SIGN CAKRA

The positional category of U+A9BE JAVANESE CONSONANT SIGN PENGKAL should be
corrected from 'Right' to 'Bottom_And_Right'.

The positional category of U+A9BF JAVANESE CONSONANT SIGN CAKRA should be
corrected from 'Right' to 'Bottom_And_Left'; but I can't find this category in
the Indic Positional Category data. This character is similar to U+103C
MYANMAR CONSONANT SIGN MEDIAL RA. U+103C is not found/categorized in the Indic
Positional Category data.

Date/Time: Thu Apr 9 01:14:22 CDT 2015
Name: R.S. Wihananto
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297

Indic Syllabic Category of U+1B03, U+1B81, and U+A982

I agree with Mr. Richard Wordingham's feedback. U+1B03 BALINESE SIGN SURANG,
U+1B81 SUNDANESE SIGN PANGLAYAR, and U+A982 SIGN LAYAR all were historically
repha; but in modern writings, these characters are final -r consonant sign.

Because Balinese, Sundanese, and Javanese characters are encoded with logical
order, I think categorized these characters for visually ordered repha is
wrong. For consistency with other Indic scripts that only use repha in older
texts (such as Telugu), old repha in Balinese, Sundanese, and Javanese should
also be encoded with RA + VIRAMA + ZWJ.

Date/Time: Mon Apr 13 20:44:21 CDT 2015
Name: Shreevatsa R
Report Type: Error Report
Opt Subject: Ambiguity in Chapter 12.1 Devanagari, section Encoding Principles

The section says (http://www.unicode.org/versions/Unicode7.0.0/ch12.pdf):

"The orthographic syllable is built up of alphabetic pieces, the actual letters
of the Devanagari script. These pieces consist of three distinct character
types: consonant letters, independent vowels, and dependent vowel signs. In a
text sequence, these characters are stored in logical (phonetic) order.
Consonant letters by themselves constitute a CV unit, where the V is an
inherent vowel, whose exact phonetic value may vary by writing system.
Independent vowels also constitute a CV unit, where the C is considered to be
null.

A dependent vowel sign is used to represent a V in CV units where V is not the
inherent vowel."

To be clear, the last sentence should read: 

A dependent vowel sign is used to represent a V in CV units where V is not the
inherent vowel **and C is not null**.

Because otherwise, it's confusing to say in one sentence that an independent
vowel is a CV unit, and in the next sentence say that in CV units a dependent
vowel sign is used. Obviously, in independent vowels (which are CV units) no
dependent vowel sign is used. 

Date/Time: Mon Apr 13 20:49:36 CDT 2015
Name: Shreevatsa R
Report Type: Error Report
Opt Subject: Ambiguity in Chapter 12.1 Devanagari, section Principles of the Devanagari script

The Unicode standard says (http://www.unicode.org/versions/Unicode7.0.0/ch12.pdf):
"Consonant letters may also be rendered as half-forms, which are presentation 
forms used to depict the initial consonant in consonant clusters" -- here, 
"initial consonant" should be "non-final consonant" (or "consonants other than 
the last one").

Date/Time: Tue Apr 21 16:39:07 CDT 2015
Name: Markus Scherer
Report Type: Error Report
Opt Subject: U+9730 霰 pinyin is not "sǎn"

Note: This report has already been sent to the Unihan experts for evaluation.

We received a bug report about the pinyin sort order for 霰. In the CLDR 
Chinese pinyin order, which is based on Unihan data, it sorts with "S" but 
should sort with "X".

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E9%9C%B0 shows:
Readings

Data type	Value
kCantonese	sin3
kDefinition	hail, sleet
kHangul	산
kHanyuPinyin	64076.140:xiàn
kJapaneseKun	ARARE
kJapaneseOn	SAN SEN
kKorean	SEN SAN
kMandarin	sǎn
kTang	sèn
kXHC1983	0986.010:sǎn 1250.050:xiàn

The CLDR data is generated by a tool that prefers kMandarin=sǎn over 
kHanyuPinyin=xiàn, because kMandarin is "The most customary pinyin 
reading for this character." (http://www.unicode.org/reports/tr38/index.html#kMandarin)

Feedback from native Chinese colleagues indicates that if they recognize 
the character, they know it is unambiguously "xiàn".
If they do not recognize it, they guess "sǎn" based on the more common 散.

If I understand correctly, this means that kMandarin=sǎn is incorrect. 
Please fix, and let me and Mark know the resolution.

References:
Xinhua Dictionary
http://zh.wikipedia.org/wiki/%E9%9C%B0
http://www.zdic.net/z/27/js/9730.htm
http://zidian.odict.net/862078860/

Date/Time: Thu Apr 23 19:12:14 CDT 2015
Name: Nick Lawson
Report Type: Public Review Issue
Opt Subject: Cheese and Bacon Emojis

Dear Unicode Consortium,

First off, I'd like to congratulate you on the release of Unicode 7.0. I love
the diverse skin tones available for characters and hand signals, as well as
the variety of new characters that were added.

I noticed that 'Cheese Wedge' is a proposed Emoji for the Unicode 8.0 update,
and I could not be more excited. I have, on several occasions, wished that
there was a cheese-related Emoji available. I would like to strongly advocate
for its inclusion, and thank you for your consideration of the community's
suggestions for characters.

I was disappointed, however, to see that a 'Bacon' Emoji was not slated for
the 8.0 update. While I understand that Unicode is universal and bacon is not
necessarily ubiquitous across cultures, bacon has become a trend in the
culture of the western world. Bacon is rising in popularity in fast food items
as well as high-end dining, and its popularity extends beyond cuisine. There
are bacon-themed clothes, bacon-scented toiletries, and bacon-related
housewares that are quite popular (although I can't say they're quite my
taste). It seems fitting that Emojis, which are also becoming a hot cultural
trend, would include a bacon unicode character.

The inclusion of 'Cheese Wedge' and 'Bacon' unicode characters in the 8.0
update would make me absolutely ecstatic. If there is a specific party or
group I should contact with these suggestions, please let me know. And if you
would like suggestions or examples of art work, I would be happy to contribute
those with no expectations of credit.

Thank you very much for your kind consideration, and I look forward to hearing
from you soon.

Sincerely,
Nick Lawson

Date/Time: Sun Apr 26 22:58:22 CDT 2015
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Confusion about meaning of White_Space property

There is confusion in the standard and the FAQ about the meaning of the
White_Space property. As I read the standard, the only official definition is
given in UAX #44:

"White_Space: [...] Spaces, separator characters and other control characters 
which should be treated by programming languages as "white space" for the 
purpose of parsing elements."

Basically, no relation to display.

But the Core Spec, in page 250, says:

"Line separation characters, such as the carriage return, do not clearly
exhibit their advance width, because they always occur at the end of a line,
but most implementations give them a visible advance width when they are
selected. Hence, they are classed together with space characters; both are
given the White_Space property. Whitespace characters are not considered to
be ignored for display."

This is contradicting the definition in UAX #44, where it says the property is
not about displaying things, but parsing things.

The FAQ also confuses the property's meaning. At
http://unicode.org/faq/unsup_char.html#2, it says:

Q: Which characters should be displayed as a visible but blank space?

A: This is the easy one: all the characters that have the White_Space
property, also generically known as “whitespace characters”. This set includes
SPACE, of course, but also such characters as the tab control character, NO-
BREAK SPACE, LINE SEPARATOR, and so on. For the full list, see the White_Space
values in PropList.txt.

Date/Time: Mon Apr 27 13:49:24 CDT 2015
Name: Sebastian Kempgen
Report Type: Public Review Issue
Opt Subject: LATIN SMALL LETTER SAKHA YAT

Note: This has been fixed in the master data file.

Hello,

in the beta code chart of UC 8.0, there is a note below character AB60, that
points to A657 "Cyrillic small letter iotified a" as the source for this
letter. This is not correct. The new letter AB60 is simply a glyph variant of
0463, Cyrillic Small letter yat. (And indeed, the code chart for 8.0 has the
correct back-reference to AB60 at 0463.)

Because AB60 and 0463 do look completely diffent, one bit of background might
help: the glyph at AB60 is simply the *upright* variant of a glyph variation
more commonly found for the *cursive* form of 0463. Even some of today's fonts
have that glyph variation which in its cursive form looks like a Latin cursive
"n" with a cyrillic soft sign tacked onto is right side.

Best regards,
Sebastian Kempgen

Date/Time: Wed May 6 14:13:03 CDT 2015
Name: Tim Larson
Report Type: Public Review Issue
Opt Subject: 8.0 beta - menorah addition

The beta code chart for U+1F54E is missing the note "Hanukiah" that had 
been previously added some time prior to Nov 19 2014. Please re-add it.

 

 


 

 

Date/Time: Wed Mar 25 12:52:19 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297

Bidi-mirroring of mathematical symbols

Hi

I would like to send you my previous suggestion with my new email address, to
replace the former post, which is poorly written too, I'm sorry.

There may be a problem with bidi-mirroring of mathematical symbols. U+2260 and
U+2262 are bidi-mirrored, U+226D is not.  So probably U+226D should be.

But it seems implementers don’t care about bidi-mirroring of mathematical
symbols, except common ones, as U+003C and U+003E.  When these are negated,
U+226E and U+226F, there is no more bidi-mirroring in Windows NotePad.  This
may be why U+226D is not bidi-mirrored in Unicode.  But it seems inconsistent.

Consistently with U+2215 / U+29F5, bidi-mirroring of U+2260 and U+2262 means a
backslash for negation when script is right-to-left.  Making it a rule,
mathematical slashes, even U+2298, convert to backslashes when bidi-mirrored
(I hope I’m right).

More generally, there is a need to inform the readers of the Code Charts which
characters are bidi-mirrored and which ones are not.  For that purpose, bidi-
mirroring should be indicated  systematically.  This is as important as casing
and glyph shape informations.  Given the amount of information made available
in the Code Charts and NamesList, bidi-mirroring should not be confined to an
implementation issue and therefore, the bidi-mirroring information  should be
available inside the Code Charts, not only in UnicodeData.txt and additional
files.

Because NamesList translators are free to add comments, I’m adding bidi-
mirroring comment lines at each character or subhead that is concerned with
this issue.  This is properly a French NamesList  translation issue.

Not showing bidi-mirroring in the Code Charts, might be interpreted as missing
respectfulness against right-to-left scripts, therefore right-to-left script
users may be worried about.  Bidi-mirroring is so important it should not be
searched for in UnicodeData.txt, BidiBrackets.txt and BidiMirroring.txt, but
shown straight in the Code Charts.

Best regards,
Marcel Schneider

Date/Time: Wed Mar 25 12:54:45 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: Public Review Issue #297

Notice line vs Comment line

Hi,

there are “idle” “@+” in the NamesList, which in the Code Charts draft have no
effect on markup or highlighting in any way, but disturbed the results of a
sorting (spreadsheet) formula with the NamesList. (In fact, I used the French
translation, where many idle “@+” are missing, and added them following the
NamesList-8.0.0d6.)

There seems to be no difference between such a NOTICE_LINE with bullet
applying to a character, and a COMMENT_LINE. Therefore, I suggest to convert
these NOTICE_LINEs to COMMENT_LINEs.

The list below shows the instances in the NamesList where “idle @+” occurs:
U+0140
U+0149
U+01A6
U+0268
U+0269
U+0277
U+027C
U+029E
U+0307
U+01E7
U+1E5B
U+2301
U+234A
U+237B
U+237D
U+237E
U+237F
U+2425
U+2426
U+16F27
U+16F32
U+16F52
U+16F53

Best regards,
Marcel Schneider

Date/Time: Wed Apr 1 09:43:56 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Code points with bookmarks in the All-in-one Code Charts PDF

Hello,

I would suggest adding the first code point before the Blockheads that display
in the side pane on Adobe Reader when viewing the All Code Charts PDF. The
Blockheads are very useful, but often a given code point must be searched for.
Then showing the Start point together with the Blockhead will help finding
glyphs quickly.

It would also be useful, instead, to show the startpoints parenthesized after
the Blockheads, if the goal is to avoid puzzling users with figures.  This
will be the reason why actually the code point searching readers can click at
an estimated (or learned) position and then control the range looking at the
page header.

Using the single code charts instead, presents some disadvantages:

— Most archive managing software cannot sort on hex values, so the code 
charts in a folder are unordered.

— For convenience, there is neither endpoint nor blockhead in the single 
charts filenames (they are opened with code point searching software).

Best regards,
Marcel Schneider

Date/Time: Sat Apr 11 09:09:00 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Head numbering in the Standard, refs to the Standard in the Code Charts

Hi,

it is sometimes hard to quote and to retrieve an instance in the Standard
because there are only two numbering levels: chapters and sections. For
example, the long section 6.2 is structured with many unnumbered heads.

Therefore I suggest adding a third numbering level. With these additional
numbers, quoting would be facilitated, and when they appear in the navigation
pane, retrieving would be too.

Numbering levels have been restricted to two in order to avoid puzzling with
too much figures. IMO the more the Standard is quoted, the more it will become
popular and well-known. This is why I suggest that comments in the Code Charts
should point to the Standard where appropriate, showing “§9.9.9” refs, for
example at U+0029:

“* see discussion on semantics of paired bracketing characters”
might become:
“* see discussion on semantics of paired bracketing characters, chapter 6 of the Standard, section 2.10”
or just:
“* see discussion on semantics of paired bracketing characters, §6.2.10”.

As well, bidi-mirroring may be referred to as “bidi-mirrored (see §4.7)” at a
significant number of instances in the Code Charts (and the NamesList).

UAXes, other UTNs and UTRs might be quoted too, where this would be helpful to
get started with Unicode.

Best regards,

Marcel Schneider

Date/Time: Mon Apr 13 10:05:09 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Bidi-mirrored Code Charts

Hi,

given that bidi-mirroring of symbols is a very complex issue (see U+2260 and
U+2262 as well as U+2298 that are bidi-mirrored, vs U+226D and U+2205 that are
actually not), it might be useful to have bidi-mirrored Code Charts.  They
would facilitate the implementation of Unicode for right-to-left scripts.

Today, users who aim at getting started with Unicode, must guess what bidi-
mirrored symbols look like whenever there is no Bidi_Mirroring_Glyph for a
ready-to-use bidi-mirroring emulation.  Therefore I suggest adding bidi-
mirrored Code Charts where appropriate.  Every block’s charts that contain
bidi-mirrored characters would be followed by a set of Code Charts where these
characters are bidi-mirrored and highlighted in some way, perhaps with the
abbreviation “BM” in the upper left corner.

Best regards.

Date/Time: Mon Apr 13 10:05:31 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

A new extended datafile to speed up implementation

Hi,

since many data came to join up UnicodeData for a complete character property
repertoire, it might be helpful to create a new overall datafile on the
pattern of UnicodeData, including more fields such as FormalAlias,
BidiMirrored, and some spare fields, in order to enhance transparency and
promote more complete implementations than are often seen even today.

The question that is to be asked, is whether the hints Unicode gives
implementers are well understood. For example, Unicode deprecates parsing the
NamesList for machine-readable information, and nevertheless it is the
NamesList that is parsed for example by keyboard creating software, while
often no notice is taken of the FormalAliases.

I take notice of the policy of adding new files rather than new fields to
UnicodeData. But perhaps it would be now time to add a new comprehensive base
file, including much of the information of NamesList and all the other files
of UCD.

Best regards.

Date/Time: Mon Apr 13 10:07:22 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Chapter 6 of the Standard

Hi,

I would like to suggest adding some information to the second paragraph of the
“Apostrophes” subsection in section 6.2 “General Punctuation” of the Standard,
on page 272.

The paragraph is pasted below, and the suggested information is added within,
highlighted with underscores.

The first statement is based on 87 klc sources of latin keyboard layouts
shipped with Windows 7 Starter. The docx is attached below. Please note the
Canadian multilingual standard keyboard layout klc source is uncomplete
because Kana is not recognized by the software. It adds as eleventh keyboard
with U+2019 among a total of 87 keyboards, making the exact percentage end up
at 87% locale latin Windows keyboards without U+2019. Even while the US
Standard keyboard layout does not contain U+2019, the US International
keyboard layout, however, does, along with U+2018.

The second statement resumes an article found with a search engine at the
following URL: http://www.newrepublic.com/article/113101/smart-quotes-are-
killing-apostrophe

The third statement results from observation using a small panel of free
ornamental fonts, most handwriting.

Best regards.
P.S.: I will send the docx by mail after this form
__________________________
When text is set, U+2019 right single quotation mark is preferred as
apostrophe, but _it is missing on about 90% of most current locale latin
keyboards, where_ only U+0027 is present__. Software commonly offers a
facility for automatically converting the U+0027 apostrophe to a contextually
selected curly quotation glyph. _This facility uses to fail when U+0027
represents a leading apostrophe, not an opening quotation mark._ In these
systems, a U+0027 in the data stream is always represented as a straight
vertical line and can never represent a curly apostrophe or a right quotation
mark. _However, many ornamental fonts associate the same curly glyph to U+0027
as to U+2019._
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Mon Apr 13 10:07:49 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Chapter 6 of the Standard

Hi,

there is some additional information about U+2044 FRACTION SLASH I would
suggest adding at the “Fraction Slash” paragraphs in the “Other Punctuation”
subsection of §6.2, page 273 of the Standard, as well as in the Code Charts’
Fractions subheader before U+2150.

U+2044 FRACTION SLASH working together with superscripts and subscripts is so
obvious no discussion is needed. On the other hand, as fraction formatting
needs at least desktop publishing software, it is usually not a part of office
automation. It seems therefore useful to show the plain text entering method
for fractions with a slanted fraction slash like the default glyph of U+2044.

The Number Forms block’s Fractions subhead may therefore be followed by a
NOTICE_LINE like this one: ‘@+’ [TAB] [TAB] ‘Fractions may be composed in
plain text on a [superscripts] 2044 [subscripts] pattern.’

Meanwhile, the Fraction Slash notice in the Standard might contain the informations 
below (including those already provided in the Standard).

Best regards.

___________________________
Fraction Slash. U+2044 FRACTION SLASH is used between digits to form numeric
fractions. It is kerning for use with superscripts and subscripts to compose
plain text fractions such as ²⁄₃ and ³⁄₉.The pattern of a plain text fraction
built using the fraction slash is defined as follows: any sequence of one or
more superscript digits (U+00B9, U+00B2, U+00B3, U+2074 - U+2079, U+2070),
followed by the fraction slash, followed by any sequence of one or more
subscript digits (U+2080 - U+2089).

U+2044 FRACTION SLASH may also act as a formatting command for use with
decimal digits, and it may be used instead of U+002F SOLIDUS prior to applying
fraction formatting. The standard form of a fraction designed for formatting
is defined as follows: any sequence of one or more decimal digits (General
Category = Nd), followed by the fraction slash, followed by any sequence of
one or more decimal digits. If the fraction is to be separated from a previous
number, then a space can be used, choosing the appropriate width (normal,
thin, zero width, and so on). For example, 1 + thin space + 3 + fraction slash
+ 4 can be displayed as 1¾.

Whether they are plain text or formatted, fractions should be displayed as a
unit, such as ¾ or {unavailable glyph}. The precise choice of display can
depend on additional formatting information. If the displaying software is
incapable of mapping the fraction to a unit, then it can also be displayed as
a simple linear sequence as a fallback (for example, 3/4). For fallback
display, U+002F SOLIDUS is preferred, because the fraction slash kerns.
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Tue Apr 14 08:14:20 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Chapter 6 of the Standard

Hello,

in section 6.2, on page 268 of the Standard, Quotation Marks and Brackets, I
suggest moving the last sentence of the second paragraph to the end of the
first paragraph. The result would look as quoted below, where the move is
highlighted with underscores.

Best regards.

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Quotation Marks and Brackets. Like brackets, quotation marks occur in pairs,
with some overlap in usage and semantics between these two types of
punctuation marks. For example, some of the CJK quotation marks resemble
brackets in appearance, and they are often used when brackets would be used in
non-CJK text. Similarly, both single and double guillemets may be treated more
like brackets than quotation marks. __Unlike brackets, quotation marks are not
mirrored in a bidirectional context.__

Some of the editing marks used in annotated editions of scholarly texts
exhibit features of both quotation marks and brackets. The particular
convention employed by the editors determines whether editing marks are used
in pairs, which editing marks form a pair, and which is the opening
character.____

____________________________

Date/Time: Tue Apr 14 08:15:17 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297


Chapter 6 of the Standard

Hello,

in section 6.2, on page 268, Language-Based Usage of Quotation Marks, it may
be possible to add some useful content to the two first paragraphs. They are
quoted below (the second is split), and changes are highlighted with
underscores.

The first modification is needed to support the fact that U+0022 is optionally
converted to chevrons (guillemets).

The meaning of the “warning” quotation marks is to prevent the reader from
taking the expression in plain sense. Following French usage they may be
called “irony quotes”, but often there is no irony at all, just the meaning of
“so-called”.

Best regards.

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Language-Based Usage of Quotation Marks 

U+0022 QUOTATION MARK is the most commonly used character for quotation mark.
However, it has ambiguous semantics and direction. Most keyboard layouts
support only U+0022 QUOTATION MARK, but software commonly offers a facility
for automatically converting the U+0022 QUOTATION MARK to a contextually
selected _quotation mark_ glyph.

European Usage. The use of quotation marks differs systematically by language
and by medium. In European typography, it is common to use _angle quotation
marks (guillemets, chevrons) in publishing_ and, except for some languages,
curly quotation marks in office automation. Single guillemets may be used _to
clarify the presence of nested quotations_. _Many authors use angle and curly
quotation marks in the same text to distinguish between quoting and warning._

The following description does not attempt to be complete, but intends to
document a range of known usages of quotation mark characters. Some of these
usages are also illustrated in Figure 6-3. In this section, the words single
and double are omitted from character names where there is no conflict or both
are meant. 

___________________________

Date/Time: Tue Apr 14 08:17:32 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

NamesList, U+0022 and U+0027

Hello,

in the NamesList and the Code Charts, at U+0022, the COMMENT_LINE “* neutral
(vertical), used as opening or closing quotation mark” should be replaced with
its counterpart at U+0027: “* neutral (vertical) glyph with mixed usage”,
because U+0022 is also used for “seconds”, “double prime” and “ditto”.

Furthermore, it is doubtful whether locale language-specific support is
appropriate here. Therefore, I suggest replacing “* preferred characters in
English for paired quotation marks are 201C & 201D”, because on one side, this
is true for other languages (e.g. French), and on the other side, there are
ways to support far more languages here, with something like “* some preferred
characters for paired double quotation marks are found at 201C-201E”. The next
step would eventually be to remove “* 05F4 is preferred for gershayim when
writing Hebrew”, and to rely on the CROSS_REF “x (hebrew punctuation gershayim
- 05F4)”. I've learned some Hebrew and I like it very much, also the Jewish
nation, of course, but I fear this COMMENT_LINE brings some risk of conflict.
IMO Hebrew language support is ensured thanks to the already provided CROSS_REF “x
(hebrew punctuation gershayim - 05F4)”.

If the above statements are right, they would identically apply to U+0027,
third and fourth COMMENT_LINE. In any case, as shown in UTN #24, an English
translation is needed (the English don’t call a slash a SOLIDUS, while they
call a FULL-STOP a period, too). In this English translation (ideally for use
in both the United States and the United Kingdom and all English speaking or
using countries), the COMMENT_LINE “* preferred characters in English for
paired quotation marks are 201C & 201D” will be really appropriate.

Best regards.

Date/Time: Wed Apr 15 10:55:39 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297

NamesList: U+05F3, U+05F4

Hello,

joint to my proposal for removing the Hebrew support COMMENT_LINEs at 
U+0022 and U+0027, leaving the CROSS_REFs only, I suggest adding appropriate 
COMMENT_LINEs in the Hebrew block, at U+05F3 and U+05F4, taking model on 
those at U+2018 and U+201C:

@		Additional punctuation
05F3	HEBREW PUNCTUATION GERESH
	* this is the preferred character (as opposed to 0027)
	x (apostrophe - 0027)
05F4	HEBREW PUNCTUATION GERSHAYIM
	* this is the preferred character (as opposed to 0022)
	x (quotation mark - 0022)

Best regards.

Date/Time: Fri Apr 17 11:45:39 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Chapter 6 of the Standard

Hi,

in section 6.2, in the subsection “Dashes and Hyphens” on page 265, I suggest
to enhance the information about U+2010 HYPHEN. In most fonts where it is
present, it does not differ in appearance from U+002D HYPHEN-MINUS. Hence, in
many fonts, whether current or ornamental, it is missing. There is at least
one good font where the statements in the Standard apply, because U+002D is
rendered with a tiny wide glyph that is not convenient as a hyphen.

Therefore, entering U+2010 as default hyphen in raw text does not seem to be
appropriate. When it is preferred in a given layout, a routine can put it at
the place of U+002D, the minus sign U+2212 MINUS SIGN being entered
expressedly to disambiguate the two semantic values of hyphenation and minus.

In practice, U+2010 seems to be seldom used. Even in the Standard, typesetted
in Minion-Regular, the hyphen character is U+002D, as found in the sample
“left-to-right” in this paragraph. I’ve tried to put U+2010 on the keyboard as
default hyphen in typesetting mode, but the problems lead me to reset the
character to U+002D. I make U+2010 available in the dead key registry (acute,
hyphen), mainly to facilitate the search-and-replace settings.

As a result, the sentence “It is rendered with a narrow width.” should be
completed in some way, because in nearly all fonts, U+002D is rendered with a
narrow width too. (I would not mention MS Gothic where U+2010 displays with an
extra space around!)

Best regards.

Date/Time: Fri Apr 17 12:16:45 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

NamesList and the Code Charts

Hi,

there seems to be a mistake with character names. In fact they are
designations, and they are handled a such. The goal of a character’s name is
to give an accurate idea of what the character is, and to facilitate referring
to in natural language. As an immutable identifier there is the code point.
Systems handle code points, not character names. Software does not need any
other identifier.

This is why freezing character names is an abuse, especially when they proved
to be wrong. There is a very strong desire to design most accurate names,
which lead to passionate discussions at the merger of ISO/IEC 10646 with
Unicode. But the renaming of U+00C6/U+00E6 to its original letter status
produced surprisingly a name-update prohibition act, a Stability Policy that
extends over names instead of ensuring code point stability only. Suddenly,
character names were called by ISO “convenient identifiers”, not more. And not
less.

Fortunately Unicode found a workaround, giving characters that are completely
misnamed, a Formal Alias, thanks to which Formal Alias aware software is able
to display a true designation in most cases. Unfortunately, the remedy is not
applied to characters such as U+002F SOLIDUS, a slash that bears the scholar
name of the fraction slash (U+2044 FRACTION SLASH may be called with some
reason a solidus). And even more unfortunately, there would be fare too many
Formal Aliases if all the abusive lateralization of bidi-mirrored paired
punctuations would be corrected. Even out of bidirectional context, the “LEFT”
qualifier is unfitting for U+2018 and U+201C in a Universal Character Set.

UnicodeData shows clearly where most of the awkward names are from. Or, more
accurately, where they are NOT from. By misnaming characters in an
ethnocentric way, ISO acted against its mission as an international standards
body. It is obvious an international organization for standardization must
respect its members’ wishes. And when one of the countries complains about
misnaming, it must correct and apologize, not rage and protest. Nor prohibit
further updates.

Therefore I suggest doing some general overhaul. Beginning with the Stability
Policy.

As to avoid lateralization where it is undue, LEFT and RIGHT may be replaced
with the original OPENING and CLOSING where it is unambiguous, or with
BACKWARD-POINTING and FORWARD-POINTING.

Best regards.

Date/Time: Mon Apr 20 03:02:12 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

U+00BF INVERTED QUESTION MARK
U+00A1 INVERTED EXCLAMATION MARK

The turned question mark (INVERTED QUESTION MARK) seems not to be used in
Catalan. Therefore it is mentioned for Castilian in a translation. Would it be
accurate to replace “* Spanish” with “* Castilian”? The same would then
probably apply to U+00A1, with as already shown in the NamesList, Asturian and
Galician (perhaps also at U+00BF?).

Best regards,
Marcel Schneider

Date/Time: Mon Apr 20 03:08:15 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

U+00D0, U+00F0 LATIN SMALL LETTER ETH; U+0110, U+0111 LATIN SMALL LETTER D WITH STROKE

As in this block, uppercase and lowercase are at 32 code points from each
other, there would normally be no need of mentioning. The Icelandic LATIN
LETTER ETH may differ from the rule because of the risk of confusing it with
U+0110/U+0111 LATIN LETTER D WITH STROKE (which lead early standards bodies to
encode the capital once only!). But ordinarily the NamesList uses
COMMENT_LINEs, not CROSS_REFs, for casing information. Therefore I suggest
replacing

	x (latin small letter eth - 00F0)
and	x (latin capital letter eth - 00D0)
with  	* lowercase is 00F0
and 	* uppercase is 00D0

The same would then apply to the LATIN LETTER D WITH STROKE, because as in
this block, lowercase follows uppercase, it was only the risk of confusion
with U+00D0/U+00F0 that lead Unicode to add the CROSS_REFs

	x (latin small letter d with stroke - 0111)
and	x (latin capital letter d with stroke - 0110)
However, as shown above,
	* lowercase is 0111)
and	* uppercase is 0110

COMMENT_LINEs seem to fit better this particular context. Even if at this
instance, they would only ensure all concerned languages are treated equally.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 20 03:09:02 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

NamesList and the Code Charts

If there is no way of reengineering character names stability, that is, if the
convenience of the relatively little number of involved specialists primes
over the billions of end-users who will stay a long time dealing with code
names while nearly no translations are provided, then I suggest making an
extensive use of the correcting facility Unicode created with FormalAliases.

To perform this, at least all characters that are bidi-mirrorred and whose
names are ethnocentric, should be provided a FormalAlias. For example: U+00AB
LEFT DOUBLE ANGLE QUOTATION MARK, which Unicode first named LEFT POINTING
GUILLEMET, may be given the FormalAlias % BACKWARD-POINTING GUILLEMET or
BACKWARD-POINTING DOUBLE ANGLE QUOTATION MARK. U+00BB RIGHT DOUBLE ANGLE
QUOTATION MARK, which Unicode first named RIGHT POINTING GUILLEMET, may be
given the FormalAlias % FORWARD-POINTING GUILLEMET or FORWARD-POINTING DOUBLE
ANGLE QUOTATION MARK.

A warning should be placed in the file header, to point out that all character
names that are followed by a FormalAlias, should be discarded at use, and the
FormalAlias is strongly recommended to be used instead.


The next question is whether it would not be sufficient, if there is a Formal
Alias, to show the code name on a CODENAME_LINE. This would allow to shift the
Formal Aliases on the NAME_LINE. Subsequently, the (relatively) few
specialists who are dealing with standardization may be asked to refer always
to the CodeName wherever there is one.

Even simplier, the roles of CharacterName and FormalAlias may be inverted at
these instances, giving the Formal Alias a Code Name status (and the Character
Name a True Designation status). Then it would be enough to disable the
FormalAlias-awareness-algorithms (which do most likely not even exist already
in end-user software, at least not in some relatively widespread free keyboard
layout creating software for end-users).

Best regards,
Marcel Schneider


Date/Time: Mon Apr 20 03:09:33 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

U+002D HYPHEN-MINUS

As the ALIAS_LINE of this character collapses nearly with its COMMENT_LINE, it
would be nice to ventilate the aliases on several lines as it is found at
U+223C TILDE OPERATOR, in order to separate the two very different semantic
values. Instead of a “hyphen or minus sign” alias, there would be two, like:

	= hyphen
	= minus sign

Furthermore, after the COMMENT_LINE

	* used for either hyphen or minus sign

it will be useful to add another one, because U+002D does not match neither
figures nor the other operators as U+002B PLUS SIGN, so using it as a minus
sign is very bad typography. This COMMENT_LINE might show the following
information:

	* 2212 is preferred for minus sign

It follows the one existing already for U+0027 APOSTROPHE (“* 2019 is
preferred for apostrophe”). Indeed, IMO the two cases are roughly similar.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 20 03:10:12 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

Numeric Separators

In the Standard, §6.2, page 275, a brief mention of Numeric Separators is
found. It consists merely in an enumeration of five characters or code points
and refers to the existence of locale and user customizations. While one of
them is the ARABIC THOUSANDS SEPARATOR U+066C, no mention is made in the ASCII
block of such a semantic value for period or comma. Nevertheless, the French
translation adds “thousands separator” among the aliases of U+002E FULL STOP.

IMO the related semantic values are so important these characters should be
shifted into the subset of well commented characters in the Code Charts, and
the related subject might become more extensively covered in the Standard.
Therefore adding some information might be useful, perhaps as suggested below.

Best regards,
Marcel Schneider


1) In the NamesList:

0027	APOSTROPHE
Add a third ALIAS_LINE:
	= prime, minutes, feet, thousands separator
Among the CROSS_REFs: add
	x (arabic thousands separator - 066C)

002C	COMMA
On the ALIAS_LINE: add “thousands separator” after “= decimal separator”.
Among the CROSS_REFs: add 
	x (arabic decimal separator - 066B)

002E	FULL STOP
Split the ALIAS_LINE and add “thousands separator” after “= decimal point”, like this:
	= period, dot
	= decimal point, thousands separator
Add a COMMENT_LINE as this:
	* used as a thousands separator when 002C is decimal separator, and conversely
and place it first.

2019	RIGHT SINGLE QUOTATION MARK
Add a second COMMENT_LINE:
	* may be used as a thousands separator
Among the CROSS_REFs: add
	x (arabic thousands separator - 066C)


2) In the Standard, on page 275, Numeric Separators, a supplemental text like
the following might be added at the end of the paragraph:

___________________________

In latin usage for example, U+002C COMMA and U+002E FULL STOP may both take
the semantic of a decimal separator. The one that is not given this value is
then currently used as a thousands separator. Alternatively, space characters
or raised separators like U+0027 APOSTROPHE may act in this way. The latter is
current Arabic usage, where U+066C is a dedicated ARABIC THOUSANDS SEPARATOR,
while 066B is a special ARABIC DECIMAL SEPARATOR, even if 060C ARABIC COMMA is
used for the same purpose.

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Mon Apr 20 11:06:54 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

0598 HEBREW ACCENT ZARQA
05AE HEBREW ACCENT ZINOR
   
I cannot measure the impact of this old and welldocumented (UTN #27) problem,
and there are surely some good reasons to keep Formal Aliases minimal.
Personally I would prefer there were some here, so I don’t keep away from
making the following suggestion:

0598	HEBREW ACCENT ZARQA
	% HEBREW ACCENT TSINORIT
	* the Tsinorit is also used alternatively to place Zarqa or Tsinor above, following a printing preference
	* character name is a misnomer

05AE	HEBREW ACCENT ZINOR
	% HEBREW ACCENT TSINOR
	* this character is used to place Zarqa or Tsinor regularly above left


These FormalAliases would have three advantages:
 
— They are unique (U+05AE cannot be given the FormalAlias ZARQA because this
is already taken).

— They are homogenous (both are called following the usage in the book of
Psalms and the other poetic books).

— There is a strong coherence between names (the one is a diminutive of the
other) and practice (the one may be used alternatively instead of the other,
following a printing preference).

Best regards,
Marcel Schneider

Date/Time: Mon Apr 20 11:13:36 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297

00DF LATIN SMALL LETTER SHARP S
1E9E LATIN CAPITAL LETTER SHARP S

While users are glad this letter has not been called a ligature (because it
isn’t), there is some reason to complain this letter has been called a sharp
s, because it isn’t neither (even if it was already called so in early
standards). The German sharp s does really exist as a phoneme, but it is
represented by a double s, not an ß. Worse, in Germany the original version of
the Standard is used, no translation, so native users do really complain. With
respect to font designers, Unicode has already well developed the glyph
comment. Now it would be nice to add the FormalAliases to these characters.

Furthermore, the comment “* uppercase is "SS"” should be avoided in this form
because it reminds an abbreviation that was used in Germany before and during
WWII.

There is also an on-going change that leads to prefer U+1E9E for uppercase.

The lines, as I suggest them, would therefore end up as follows:

00DF	LATIN SMALL LETTER SHARP S
	% LATIN SMALL LETTER SZ
	= Eszett
	* German
	* character name is an old misnomer
	* uppercase is two times U+0053, but tends to be 1E9E
[...]

1E9E	LATIN CAPITAL LETTER SHARP S
	% LATIN CAPITAL LETTER SZ
	* character name results from a misnomer
	* lowercase is 00DF

And I would suppress in both cases the crossrefs “x (latin capital letter
sharp s - 1E9E)” and “x (latin small letter sharp s - 00DF)”, according to the
rule that in the Code Charts, special casing information is provided with
comments only.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 20 11:15:57 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297


00FF LATIN SMALL LETTER Y WITH DIAERESIS
0178 LATIN CAPITAL LETTER Y WITH DIAERESIS

As already reported for U+00D0/U+00F0, U+0110/U+0111, and U+00DF/U+1E9E, the
crossrefs for casing at U+00FF/U+0178 may consistently be replaced with
comments:

	* uppercase is 0178
	* lowercase is 00FF

There seem to be no more instances where this suggestion would apply.

Best regards,
Marcel Schneider

Date/Time: Tue Apr 21 10:16:42 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 LATIN LETTER AE

The versioning of the alias names at U+00C6 and U+00E6 should correctly be
"(1.1)". This results from matching the NamesList-8.0.0d6.txt with
UnicodeData-8.0.0d8.txt. The NamesList shows:

00C6	LATIN CAPITAL LETTER AE
	= latin capital ligature ae (1.0)
00E6	LATIN SMALL LETTER AE
	= latin small ligature ae (1.0)

Therewhile, UnicodeData shows:

00C6;LATIN CAPITAL LETTER AE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER A E;;;00E6;
00E6;LATIN SMALL LETTER AE;Ll;0;L;;;;;N;LATIN SMALL LETTER A E;;00C6;;00C6

Field 11 in UnicodeData is the Unicode 1.0 name. For these instances, Unicode
is reported to have been forced by ISO to abandon the 1.0 names and to put ISO
names in their place, prior to issuing the 1.1 version of the Standard, that
is, the merged Unicode + ISO 10646 Standard. Therefore, the ALIAS_LINEs will
correct to the following:

00C6	LATIN CAPITAL LETTER AE
	= latin capital ligature ae (1.1)
00E6	LATIN SMALL LETTER AE
	= latin small ligature ae (1.1)

This particularly puzzling versioning might, perhaps, be explained in a
COMMENT_LINE:

	* this character has been renamed again in version 2.0

Whether such a comment should be added, or not, is another question. IMO it
may, to show Unicode has been willing to update names but is hindered to do
so. But as such a comment is most likely to wake up old acrimony, it finally
should rather not.

By contrast, surely it wouldn't make much sense to add a second alias, like:

00C6	LATIN CAPITAL LETTER AE
	= latin capital ligature ae (1.1)
	= latin capital letter a e (1.0)
00E6	LATIN SMALL LETTER AE
	= latin small ligature ae (1.1)
	= latin small letter a e (1.0)

Date/Time: Wed Apr 22 11:24:56 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 NamesList syntax

NamesList syntax

To complete my suggestion about adding a CODENAME_LINE, I suggest to choose 
the ampersand and to replace it by a lock. The complete syntax would then show 
as follows:

CODENAME_LINE:
		TAB "&" SP NAME LF   
			// Replace & by U+1F512, output line as code name

The lock symbol is consistent with the fact that an identifier name, once
published, must never be changed, and indicates clearly which name among the
two is immutable.

As a code character for the lock, I’d prefer the pound sign, but it isn’t a
part of the US Standard layout, only of the US International keyboard layout
and half of the latin Windows locale keyboard layouts.
 
U+00A3 POUND SIGN would be a reverence to the locale that determined code
names in the nineties. Similarly to the dollar sign as used in spreadsheet
software, it would signify stability due to locking of properties (no matter
how accurately they once were defined).

Nevertheless it might be inappropriate to associate a currency symbol.

By contrast, the ampersand is neutral and has a matching meaning. Another
advantage is that actually it occurs only three times in the NamesList, of
whom the third is an HTML code:

0022	QUOTATION MARK
	* preferred characters in English for paired quotation marks are 201C & 201D
0027	APOSTROPHE
	* preferred characters in English for paired quotation marks are 2018 & 2019
29DC	INCOMPLETE INFINITY
	= ISOtech entity &iinfin;

There is no need for change, since the percent sign (marking up a
FORMALALIAS_LINE) equally occurs in other contexts in the NamesList.

About the principle of giving the true identifiers a place on the NAME_LINE,
lowering the stable but false identifiers, there is to say that many
implementations are so uncomplete there is scarcely any trace even of
FormalAliases to find. Among users, there is a preference for true character
names. Users who prefer stable identifiers, may consider to refer to the next
line in those cases.

This change brings that instead of a wrong Name and a true FormalAlias, there
will be a true Name and a wrong CodeName. This will resolve all the problems
brought up by uncomplete UCD parsers that are currently misused as UIs, or
that are designed as UIs but do not implement the complete range of UCD
datafiles.

Best regards,
Marcel Schneider

Date/Time: Wed Apr 22 11:26:44 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 Fraction slash

2044 FRACTION SLASH

Additionally to a previous feedback, I would suggest adding the hint about how
to compose arbitrary fractions in plain text, in another place as well. This
could be the entry of the fraction slash U+2044 and, more precisely, the end
of the existing COMMENT_LINE, after a comma:

2044	FRACTION SLASH
	= solidus (in typography)
	* for composing arbitrary fractions, in plain text with superscripts and subscripts.

A demo file opening in a word processor, typesetted in Arial Unicode MS
typeface, is available at bit.ly/1DNPtf0

To view it in PDF, there is another file at bit.ly/1JutBGK

Best regards,
Marcel Schneider

Date/Time: Fri Apr 24 11:46:48 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 Corrigendum

There is at least one error in my post on Tue Apr 21 10:16:42 CDT 2015.
The UniData “1.0 Name” field number is 10, not 11. Sorry.

Best regards,
Marcel Schneider

Date/Time: Fri Apr 24 11:50:51 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 The Standard

To avoid globally wasting time and resources, the Unicode Standard must reach
the maximum of usefulness. As a Universal Character Set, it is designed for
use by end-users as well, not merely standardizers, implementers and
developers. Or at least, it _should_ be, as I’m pointing out.

This is why a small number of specialists who need names *stability*, stands
in front of a huge number of users who prioritize names *accurateness*. But
the latter don’t have means to rewrite or even adapt the Standard. As
experience tends to show, this is true for most developers, too, who rely on
the Standard and don’t aim at correcting it.

No matter what Unicode names are defined as, they are taken as designations,
as if they were scientific names. Scientists correct a name as soon as it
proves to be inaccurate. And scientists manage very well several names per
item while underscoring the most true one. This very useful system may give a
paradigm of the way Unicode could deal with character names. ISO and Unicode
having made a joint decision not to do so, it may be permitted to ask whether
that decision could have been right or wrong.

Take The Unicode Standard. In the text it uses current names, that are mostly
identical with the Unicode 1.0 names, while the identifier of many characters
differs from their current name. That may be called a useless and
counterprouctive complication, which may appear as an impoliteness to readers
who are not native speakers.

The issue after having done yet a big part of standardization, is to transform
a body-centered standard into a user-centered standard. A user-centered UCS is
directly useful without needing a lot of precautions. By contrast, the actual
concept is based upon the externalization of the care for accessibility, which
is time- and money-wasting outside.

Best regards,
Marcel Schneider

Date/Time: Fri Apr 24 11:52:43 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 MICR U+2446 sq

Typo and Formal Aliases in MICR

In the subheader notice line of the magnetic ink character recognition
symbols, the first word of the second sentence is “The”, but as it is followed
by a verb, it probably should be “They”.

Apart this little typo, I’d expect the first two characters would be given a
Formal Alias too, as have been the other two. That is a part of my concern
about making the Standard more useful by growing the interest for the Formal
Aliases and convince implementers and developers to write some additional
parsing algorithms that would make all software “FormalAlias-aware”. With
actually eighteen Formal Aliases only, developers don’t seem to be thinking
seriously about the issue.

Best regards,
Marcel Schneider

Date/Time: Fri Apr 24 12:51:50 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 U+1F9C0, Translations

U+1F9C0 CHEESE WEDGE and Translations of the Code Charts

Dear Unicode Consortium,

I'm pleased to read the Feedback from Mr Lawson and would join my
congratulations to his'.  The Cheese Wedge symbol he underscores, recalls me
the new sets have already been translated to French, and on-going efforts aim
at providing a most accurate rendering of the Standard's actual whole content
in French language.

This highlights that my concern is not that translations should ever be
avoided. I'm just anxious about all the other languages, among which some
widely spoken ones admittedly don't have any translation of Unicode, everybody
referring to the English original files exclusively, as I've read.

More precisely about the Cheese Wedge, I'm glad to see unbloody, no-slaughter
food is now strongly promoted and is given a fabulous opportunity of becoming
a wide-spread cultural phenomenon.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:06:24 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Corrigenda

There are at least two mistakes in my post on Mon Apr 20 03:09:02 CDT 2015.
In the second paragraph, 
“U+00AB LEFT DOUBLE ANGLE QUOTATION MARK” and 
“U+00BB RIGHT DOUBLE ANGLE QUOTATION MARK” 
should read 
“U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK” and 
“U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK”.
I’m sorry about all these mistakes in my posts. 

These ISO names support correctly all *latin* scripts, 
by contrast with the abusively lateralized ISO names for 
single and double quotation marks U+2018 & U+2019 and U+201C & U+201D.

However, this has no impact on my concern related to names supporting *all* 
scripts, which can be properly ensured by giving such bidi-mirrored characters 
a Formal Alias where LEFT is replaced with OPENING or BACKWARD, and RIGHT with 
CLOSING or FORWARD, or following other patterns. (It may be noted that the 
names’ complication induced by these modifications is incomparably less 
cumbersome than the one that was due to the replacement of GUILLEMET with 
DOUBLE ANGLE QUOTATION MARK.)

For a *Universal* Character Set (as well as for an *International* Standards 
Organization), this care for a practicable universality is a real need.

Best regards,
Marcel Schneider
___________________________
P.S.: There is a typo in my post on  Fri Apr 17 12:16:45 CDT 2015. 
In paragraph 3, “fare” should read “far”.
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Mon Apr 27 01:07:36 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+0670 ARABIC LETTER SUPERSCRIPT ALEF

U+0670 ARABIC LETTER SUPERSCRIPT ALEF needs a formal alias and 
a consistent subhead

All sources I could look up confirm that U+0670 is a vowel sign (and 
a combining mark). Since it is a misnomer, it needs a Formal alias, 
which I guess would be approximately “ARABIC VOWEL SIGN SUPERSCRIPT ALEF”. 
This results from the other vowel sign instances inside the block and 
elsewhere. 

This character is listed after a subhead “Point”. In the Syriac block U+0700, 
a “Syriac points” subhead is found indeed (U+0730), which is completed with 
the parenthesized “(vowels)” explanation. By contrast, in the 
Arabic block U+0600, most of the vowel signs are listed under 
“Other combining marks”, while the Arabic Extended-A block U+08A0 contains 
several “Extended vowel signs” subheads. 

Would it therefore be possible to rewrite the entry
@		Point
0670	ARABIC LETTER SUPERSCRIPT ALEF
	* actually a vowel sign, despite the name

in a way like this:

@		Point (vowel)
0670	ARABIC LETTER SUPERSCRIPT ALEF
	% ARABIC VOWEL SIGN SUPERSCRIPT ALEF
	* this diacritical mark is a vowel sign, character name is a misnomer

Or the existing comment is kept, but a formal alias seems to be unavoidable.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:08:22 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+047C/U+047D CYRILLIC LETTER BEAUTIFUL OMEGA, UTN #27

In UTN #27, the two misnomers U+047C and U+047D are missing. 
They could be added in UTN #27, as well as U+0709, and given formal 
aliases too. The related NamesList entries might then transform from:

047C	CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
	= Cyrillic "beautiful omega"
	* despite its name, this character does not have a titlo, nor is it composed of an omega plus a diacritic
	x (cyrillic capital letter broad omega - A64C)
047D	CYRILLIC SMALL LETTER OMEGA WITH TITLO

to:

047C	CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
	% CYRILLIC CAPITAL LETTER BEAUTIFUL OMEGA
047D	CYRILLIC SMALL LETTER OMEGA WITH TITLO
	% CYRILLIC SMALL LETTER BEAUTIFUL OMEGA
	* the Cyrillic beautiful omega does not have a titlo, nor is it composed of an omega plus a diacritic
	* character name is a misnomer
	x (cyrillic small letter broad omega - A64D)

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:09:11 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Hiragana and Katakana ligatures (U+309F, U+30FF)

As Andrew West states on “Unicode Character Names Part 1”, 
the U+309F HIRAGANA DIGRAPH YORI and the U+30FF KATAKANA DIGRAPH KOTO 
are ligatures, not digraphs. Would it therefore be appropriate to correct 
the two entries with formal aliases and a matching subhead? Probably they 
would look like:

@		Hiragana ligature
309F	HIRAGANA DIGRAPH YORI
	% HIRAGANA LIGATURE YORI
	* historically used in vertical contexts, but now found also in horizontal layout
	# <vertical> 3088 308A

@		Katakana ligature
30FF	KATAKANA DIGRAPH KOTO
	% KATAKANA LIGATURE KOTO
	* historically used in vertical contexts, but now found also in horizontal layout
	# >vertical< 30B3 30C8


Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:10:17 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Spacing diacritics

Spacing diacritics in Latin-1 block: U+005E, U+0060, 00B8;
U+005F LOW LINE

The two spacing accents U+005E CIRCUMFLEX ACCENT and U+0060 GRAVE ACCENT 
as well as U+00B8 CEDILLA should be given an ALIAS_LINE or, even better, 
a FORMALALIAS_LINE, containing their Unicode 1.0 name SPACING CIRCUMFLEX, 
SPACING GRAVE, SPACING CEDILLA respectively.

This results from the usage as graphic characters (^, `) without any relation 
to their value as accents. This value is relevant only in their usage as 
deadkey characters.

Precision and unambigousness, a main issue in a standard of a UCS, turn out 
to be missing here. This is true as well for U+005F LOW LINE.  “Low line” is 
inconsistent with “overline”, it lacks the precision (spacing or combining) 
that is needed in a standard, and is globally less used than “underline” and 
far less than “underscore”.
Three sample instances could therefore show as follows. The first series is 
the actual state, the second is with additional aliases:

005E	CIRCUMFLEX ACCENT
	* this is a spacing character
[...]
005F	LOW LINE
	= spacing underscore (1.0)
	* this is a spacing character
[...]
0060	GRAVE ACCENT
	* this is a spacing character
[...]
__________________________

005E	CIRCUMFLEX ACCENT
	= spacing circumflex (1.0)
	* this is a spacing character
[...]
005F	LOW LINE
	= spacing underscore (1.0)
	* this is a spacing character
[...]
0060	GRAVE ACCENT
	= spacing grave (1.0)
	* this is a spacing character
[...]

However, actually the goal might have been to avoid showing too much how 
accurate the Unicode 1.0 names were. This may lead to suppress the 
versioning, while underscoring the importance of the aliases by raising them 
to FormalAlias state.
This would allow to remove the COMMENT_LINEs, because they become redundant, 
like this:

005E	CIRCUMFLEX ACCENT
	% SPACING CIRCUMFLEX
[...]
005F	LOW LINE
	% SPACING UNDERSCORE
[...]
0060	GRAVE ACCENT
	% SPACING GRAVE
[...]

For a more advanced solution, please refer to my next post.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:11:09 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+1D13A MUSICAL SYMBOL MULTI REST

The U+1D13A character name being a misleading misnomer, it can be allowed 
a formal alias taken from the aliases already provided, as for example 
“MUSICAL SYMBOL DOUBLE WHOLE-REST” (with or without hyphen?). 
Perhaps a “* character name is a misnomer” comment may be added too.

Although, “two” are already considered as “several”, therefore “multi” 
is not entirely wrong.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:11:42 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: The Greek letter LAMBDA

Following H. G. Liddell and R. Scott, A Greek English Lexicon, 
the spelling LAMBDA is incorrect, and the real spelling of the related 
Greek letter is LABDA. By contrast, there is no mention of a spelling 
LAMDA. Nevertheless, the ISO/IEC 10646 chief redactor, who was 
a compatriot of H. G. Liddell and R. Scott, forced the Unicode Consortium 
to abandon the current spelling LAMBDA it had adopted, and to replace 
all instances with the non-existent spelling LAMDA.

I therefore suggest to create as many formal aliases as there are 
misspelled instances:

039B	GREEK CAPITAL LETTER LAMDA
	% GREEK CAPITAL LETTER LAMBDA
03BB	GREEK SMALL LETTER LAMDA
	% GREEK SMALL LETTER LAMBDA
1D27	GREEK LETTER SMALL CAPITAL LAMDA
	% GREEK LETTER SMALL CAPITAL LAMBDA
1038D	UGARITIC LETTER LAMDA
	% UGARITIC LETTER LAMBDA
1D6B2	MATHEMATICAL BOLD CAPITAL LAMDA
	% MATHEMATICAL BOLD CAPITAL LAMBDA
1D6CC	MATHEMATICAL BOLD SMALL LAMDA
	% MATHEMATICAL BOLD SMALL LAMBDA
1D6EC	MATHEMATICAL ITALIC CAPITAL LAMDA
	% MATHEMATICAL ITALIC CAPITAL LAMBDA
1D706	MATHEMATICAL ITALIC SMALL LAMDA
	% MATHEMATICAL ITALIC SMALL LAMBDA
1D726	MATHEMATICAL BOLD ITALIC CAPITAL LAMDA
	% MATHEMATICAL BOLD ITALIC CAPITAL LAMBDA
1D740	MATHEMATICAL BOLD ITALIC SMALL LAMDA
	% MATHEMATICAL BOLD ITALIC SMALL LAMBDA
1D760	MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMBDA
1D77A	MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD SMALL LAMBDA
1D79A	MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMBDA
1D7B4	MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMBDA

For personal use this is easily done in a spreadsheet. 
Nevertheless it is unefficient to do so if millions of developers 
and users must launch the process for themselves. Therefore, if 
Unicode could to the job once, that would be more economical.

Regarding the ISO, it has no right to purposely lower the 
cultural content of the Standard.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 01:13:55 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297 NamesList syntax, FormalAliases, UnicodeData

A Solution Combining Usefulness And Respectfulness Towards Stability Policy

Dear Unicode Technical Committee,

as we are tought by the principle which the ancient Romans forwarded to us: 

	Pacta tenenda sunt. 

This is why actual Formal Aliases must never become Character Names 
[as opposed to my opinion posted on Mon Apr 20 03:09:02 CDT 2015, and 
on Wed Apr 22 11:24:56 CDT 2015], regardless of the severity of 
the misleading users are victims of, and how crestfallen implementers 
and developers ever might be in front of a Standard making them waste 
time and money by asking to be translated to English.

This is why the draft of a solution Unicode has implemented 
since version 5.0, may be developed further. 
A Formal Alias is a kind of “second chance” for misnamed characters. 
Given that today’s computing resources allow to manage two names per item, 
there is no more need to keep Formal Aliases minimal. 

Furthermore, the NamesList syntax is not submitted to the Stability Policy, 
so Formal Aliases (including the "%" / U+203B "※" symbol) can be raised 
to the first line. The order “NAME_LINE, FORMALALIAS_LINE” is merely 
conventional and is not enforced programmatically. The challenge is 
to modify the syntax of the FORMALALIAS_LINE (adding CHAR), and 
to define an alternative syntax for the NAME_LINE (without CHAR).

The related NamesList syntax would then show as follows 
(I’m quoting the NamesList.html page, changes being highlighted 
with double asterisks):

CHAR_ENTRY**_N**:   NAME_LINE |  RESERVED_LINE
		| CHAR_ENTRY ALIAS_LINE
****
		| CHAR_ENTRY COMMENT_LINE
[...]

CHAR_ENTRY**_FA**:   **FORMALALIAS_LINE**
		**NAME_LINE**
		| CHAR_ENTRY ALIAS_LINE
****
		| CHAR_ENTRY COMMENT_LINE
[...]


The explanations which follow this syntax notation, would then probably 
be completed with a sentence like:

**A FORMALALIAS_LINE must be directly followed by a NAME_LINE.**

Then in the paragraph “Directly following either a NAME_LINE or 
a RESERVED_LINE”, the “FORMALALIAS_LINE” may be removed.

Fortunately, already today the NAME_LINE may occur elsewhere than 
in the first place, as it is stated: “The conventional order of elements 
in a char entry: NAME_LINE, FORMALALIAS_LINE, [...] is not enforced 
by the layout program”. 
The software issue would be that Unibook processes also the new forms 
of NAME_LINE and FORMALALIAS_LINE.


The related NamesList File Elements might then show this way 
(the first is a quotation):

NAME_LINE:	CHAR TAB NAME LF
			// The CHAR and the corresponding image are echoed, 
			// followed by the name as given in NAME

		**| TAB NAME LF
			// If the character has a formal alias**

FORMALALIAS_LINE:	**CHAR** TAB "%" SP NAME LF
			**// The CHAR and the corresponding image are echoed,**
			// followed by U+203B replacing %, 
			// then output NAME as formal alias


When these syntax changes are defined, this would be the first char entries 
having a formal alias today:

01A2	% LATIN CAPITAL LETTER GHA
	LATIN CAPITAL LETTER OI
01A3	% LATIN SMALL LETTER GHA
	LATIN SMALL LETTER OI
	* Pan-Turkic Latin alphabets

This proves that the name and the formal alias remain unchanged 
across these enhancements. Thus, stability stays guaranteed.


The great advantage of this array is, that software that does not care 
about formal aliases and nevertheless seems to be designed to inform 
end-users, will automatically show the true name if it parses the NamesList. 
The problem is with UnicodeData parsers, because they parse a file where 
no formal aliases are shown (which were added when the file format was 
already defined).

Therefore IMO it would be useful to add some fields in UnicodeData, 
one of which will contain the Formal Aliases (see also my post 
on Mon Apr 13 10:05:31 CDT 2015). 

In fact there are scarcely any backwards-compatibility problems with 
new fields, because out-of-date software is expected to simply ignore them. 

The missing visibility of Formal Aliases seems to be an effect of their 
not being in UnicodeData. Clearly, parsing supplemental data files to gather 
a complete overview of character information may be considered as unefficient. 

Consistently, many developers would expect Unicode adding as many fields 
to UnicodeData as needed to get at reach all data that are to be processed. 

Software that comes to bug under the effect of additional fields 
in UnicodeData, is likely to need a thorough overhaul in any case. 
As for that, it is simplier to implement additional fields than 
additional files.


For convenience, the list below is meant to illustrate how the NamesList 
and the Code Charts may look like, after applying the above and some 
previous suggestions. To shorten, FORMALALIAS_LINE and NAME_LINE are 
displayed only. Some ad-hoc annotation is added.
— Unfortunately this list is unfinished —


Best regards,
Marcel Schneider
__________________________

0028	% OPENING PARENTHESIS
	LEFT PARENTHESIS
0029	% CLOSING PARENTHESIS
	RIGHT PARENTHESIS
[posted on Fri Apr 17, 2015]

002E	% PERIOD
	FULL STOP
[“period” seems to be more universal, “full stop” being an alternative name.
For example, The Advanced Learner’s Dictionary of Current English from 
the Oxford University Press shows as the sixth and last meaning of “period”: 
“the pause at the end of a sentence; the mark, also called _a full stop_ (.), 
expressing this. *put a period to*, put an end to.”]

002F	% SLASH
	SOLIDUS
[“solidus” is suspected to be an intentional, misleading misnomer]

0040	% AT SIGN
	COMMERCIAL AT
[The precision “commercial” is useless because there is no other at sign 
(as opposed to the commercial minus sign U+2052), “sign” is missing, and 
this ISO name is inconsistent (the percent sign neither is not called 
“commercial percent [sign]”). As a general rule meant for ISO, 
UCS names must not follow personal preferences.]

005B	% OPENING SQUARE BRACKET
	LEFT SQUARE BRACKET
005C	% BACKSLASH
	REVERSE SOLIDUS
005D	% CLOSING SQUARE BRACKET
	RIGHT SQUARE BRACKET

005E	% SPACING CIRCUMFLEX
	CIRCUMFLEX ACCENT
005F	% SPACING UNDERSCORE
	LOW LINE
0060	% SPACING GRAVE
	GRAVE ACCENT
[please see today’s post]

007B	% OPENING CURLY BRACKET
	LEFT CURLY BRACKET
007D	% CLOSING CURLY BRACKET
	RIGHT CURLY BRACKET

00A1	% TURNED EXCLAMATION MARK
	INVERTED EXCLAMATION MARK

00AB	% BACKWARD-POINTING DOUBLE ANGLE QUOTATION MARK
	LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

00B4	% SPACING ACUTE
	ACUTE ACCENT

00B8	% SPACING CEDILLA
	CEDILLA

00BB	 % FORWARD-POINTING DOUBLE ANGLE QUOTATION MARK
	RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

00BF	% TURNED QUESTION MARK
	INVERTED QUESTION MARK

00DF	% LATIN SMALL LETTER SZ
	LATIN SMALL LETTER SHARP S
[posted on Mon Apr 20, 2015]

010C	% LATIN CAPITAL LETTER C WITH HACEK
	LATIN CAPITAL LETTER C WITH CARON
010D	% LATIN SMALL LETTER C WITH HACEK
	LATIN SMALL LETTER C WITH CARON
[For these instances and the following, please see at 02C7.]

010E	% LATIN CAPITAL LETTER D WITH HACEK
	LATIN CAPITAL LETTER D WITH CARON
010F	% LATIN SMALL LETTER D WITH HACEK
	LATIN SMALL LETTER D WITH CARON

011A	% LATIN CAPITAL LETTER E WITH HACEK
	LATIN CAPITAL LETTER E WITH CARON
011B	% LATIN SMALL LETTER E WITH HACEK
	LATIN SMALL LETTER E WITH CARON

0132	% LATIN CAPITAL LETTER IJ
	LATIN CAPITAL LIGATURE IJ
0133	% LATIN SMALL LETTER IJ
	LATIN SMALL LIGATURE IJ

013D	% LATIN CAPITAL LETTER L WITH HACEK
	LATIN CAPITAL LETTER L WITH CARON
013E	% LATIN SMALL LETTER L WITH HACEK
	LATIN SMALL LETTER L WITH CARON

0147	% LATIN CAPITAL LETTER N WITH HACEK
	LATIN CAPITAL LETTER N WITH CARON
0148	% LATIN SMALL LETTER N WITH HACEK
	LATIN SMALL LETTER N WITH CARON

0152	% LATIN CAPITAL LETTER OE
	LATIN CAPITAL LIGATURE OE
0153	% LATIN SMALL LETTER OE
	LATIN SMALL LIGATURE OE

0158	% LATIN CAPITAL LETTER R WITH HACEK
	LATIN CAPITAL LETTER R WITH CARON
0159	% LATIN SMALL LETTER R WITH HACEK
	LATIN SMALL LETTER R WITH CARON
0160	% LATIN CAPITAL LETTER S WITH HACEK
	LATIN CAPITAL LETTER S WITH CARON
0161	% LATIN SMALL LETTER S WITH HACEK
	LATIN SMALL LETTER S WITH CARON

0164	% LATIN CAPITAL LETTER T WITH HACEK
	LATIN CAPITAL LETTER T WITH CARON
0165	% LATIN SMALL LETTER T WITH HACEK
	LATIN SMALL LETTER T WITH CARON

017D	% LATIN CAPITAL LETTER Z WITH HACEK
	LATIN CAPITAL LETTER Z WITH CARON
017E	% LATIN SMALL LETTER Z WITH HACEK
	LATIN SMALL LETTER Z WITH CARON

0190	% LATIN CAPITAL LETTER EPSILON
	LATIN CAPITAL LETTER OPEN E
[deduced from UTN #27]

01A2	% LATIN CAPITAL LETTER GHA
	LATIN CAPITAL LETTER OI
01A3	% LATIN SMALL LETTER GHA
	LATIN SMALL LETTER OI
[These are the first already existing formal aliases.]

01BE	% LATIN STACKED LIGATURE TS [???]
	LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE
[UTN #27]

01C4	% LATIN CAPITAL LETTER DZ WITH HACEK
	LATIN CAPITAL LETTER DZ WITH CARON
01C5	% LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH HACEK
	LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
01C6	% LATIN SMALL LETTER DZ WITH HACEK
	LATIN SMALL LETTER DZ WITH CARON
__________________________
01CD-01D4: If in Sinology, 
“caron” is preferred, the 
Pinyin diacritic-vowel 
combinations must not be 
given formal aliases. 
Otherwise they should be.
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
01E6	% LATIN CAPITAL LETTER G WITH HACEK
	LATIN CAPITAL LETTER G WITH CARON
01E7	% LATIN SMALL LETTER G WITH HACEK
	LATIN SMALL LETTER G WITH CARON
01E8	% LATIN CAPITAL LETTER K WITH HACEK
	LATIN CAPITAL LETTER K WITH CARON
01E9	% LATIN SMALL LETTER K WITH HACEK
	LATIN SMALL LETTER K WITH CARON

01EE	% LATIN CAPITAL LETTER EZH WITH HACEK
	LATIN CAPITAL LETTER EZH WITH CARON
01EF	% LATIN SMALL LETTER EZH WITH HACEK
	LATIN SMALL LETTER EZH WITH CARON
01F0	% LATIN SMALL LETTER J WITH HACEK
	LATIN SMALL LETTER J WITH CARON

021E	% LATIN CAPITAL LETTER H WITH HACEK
	LATIN CAPITAL LETTER H WITH CARON
021F	% LATIN SMALL LETTER H WITH HACEK
	LATIN SMALL LETTER H WITH CARON

0238	% LATIN SMALL LIGATURE DB
	LATIN SMALL LETTER DB DIGRAPH
0239	% LATIN SMALL LIGATURE QP
	LATIN SMALL LETTER QP DIGRAPH
[UTN #27]

025B	% LATIN SMALL LETTER EPSILON
	LATIN SMALL LETTER OPEN E
025E	% LATIN SMALL LETTER CLOSED REVERSED EPSILON
	LATIN SMALL LETTER CLOSED REVERSED OPEN E
[UTN #27]

0285	% LATIN SMALL LETTER REVERSED R WITH FISHHOOK AND RETROFLEX HOOK
	LATIN SMALL LETTER SQUAT REVERSED ESH
[UTN #27]

02C7	% MODIFIER LETTER HACEK
	CARON
[UTN #27, but the háček *has* been called so by Unicode. 
“Caron” is even inconsistent here since all these caracters are 
modifier letters and have their name beginning with. 
“Caron” is further respectless against native speakers of háček-using 
languages; *not* in the ‘United States Government Printing Office Style Manual’, 
because this may be for internal and national use, but in *ISO* standards 
which are international and must meet the involved member nations’ usages.]

030C	% COMBINING HACEK
	COMBINING CARON
032C	% COMBINING HACEK BELOW
	COMBINING CARON BELOW

034F	COMBINING GRAPHEME JOINER:
This character is listed among the “Known Anomalies”. 
However, instead of any (hard to find) alias, it could be given 
references to TUS, as “see §7.9 and §23.2”, to complete the existing 
COMMENT_LINEs (as already suggested generally in my feedback on Sat Apr 11, 2015).

039B	% GREEK CAPITAL LETTER LAMBDA
	GREEK CAPITAL LETTER LAMDA
03BB	% GREEK SMALL LETTER LAMBDA
	GREEK SMALL LETTER LAMDA
[UTN #27; please see today’s post]

047C	% CYRILLIC CAPITAL LETTER BEAUTIFUL OMEGA
	CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
047D	% CYRILLIC SMALL LETTER BEAUTIFUL OMEGA
	CYRILLIC SMALL LETTER OMEGA WITH TITLO
[please refer to my post of today]

0598	% HEBREW ACCENT TSINORIT
	HEBREW ACCENT ZARQA
05AE	% HEBREW ACCENT TSINOR
	HEBREW ACCENT ZINOR
[UTN #27, and please see post on Mon Apr 20, 2015]

0670	% ARABIC VOWEL SIGN SUPERSCRIPT ALEF
	ARABIC LETTER SUPERSCRIPT ALEF
[UTN #27; and another post today]

06C0	% ARABIC LIGATURE HEH WITH YEH ABOVE
	ARABIC LETTER HEH WITH YEH ABOVE
06C2	% ARABIC LIGATURE HEH GOAL WITH HAMZA ABOVE
	ARABIC LETTER HEH GOAL WITH HAMZA ABOVE
06D3	% ARABIC LIGATURE YEH BARREE WITH HAMZA ABOVE
	ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
[UTN #27]

0709	% SYRIAC SUBLINEAR COLON SKEWED LEFT
	SYRIAC SUBLINEAR COLON SKEWED RIGHT
[This is an existing formal alias. 
It should be added in UTN #27.]

0A01	% GURMUKHI SIGN ADDAK BINDI
	GURMUKHI SIGN ADAK BINDI
[UTN #27 shows character name is a misspelling]

0B83	% TAMIL SIGN AYTHAM
	TAMIL SIGN VISARGA
[UTN #27: character name is a misnomer]

0CDE	% KANNADA LETTER LLLA
	KANNADA LETTER FA

0E9D	% LAO LETTER FO FON
	LAO LETTER FO TAM
0E9F	% LAO LETTER FO FAY
	LAO LETTER FO SUNG
0EA3	% LAO LETTER RO
	LAO LETTER LO LING 
0EA5	% LAO LETTER LO
	LAO LETTER LO LOOT
[UTN #27]

0F0A	% TIBETAN MARK ZOU YIK GUI GO
	TIBETAN MARK BKA- SHOG YIG MGO
[UTN #27 and the French translation “ListeDesNoms-7.0(2014-06-22).txt” 
(courtesy http://hapax.qc.ca), which gives the alias “z'ou yik gui go”; 
the apostrophe has been deleted to conform to the English NamesList syntax rules]

0F0B	% TIBETAN MARK TSHEG
	TIBETAN MARK INTERSYLLABIC TSHEG
0F0C	TIBETAN MARK NO-BREAK TSHEG [???]
	TIBETAN MARK DELIMITER TSHEG BSTAR
0FD0	% TIBETAN MARK BKA- SHOG GI MGO RGYAN
	TIBETAN MARK BSKA- SHOG GI MGO RGYAN
[UTN #27]

156F	% CANADIAN SYLLABICS ASTERISK
	CANADIAN SYLLABICS TTH
[UTN #27]

178E	% KHMER LETTER NNA
	KHMER LETTER NNO
179E	% KHMER LETTER SSA
	KHMER LETTER SSO
[UTN #27]

1D27	% GREEK LETTER SMALL CAPITAL LAMBDA
	GREEK LETTER SMALL CAPITAL LAMDA

1E9E	% LATIN CAPITAL LETTER SZ
	LATIN CAPITAL LETTER SHARP S
[posted on Mon Apr 20, 2015]

2018	% SINGLE TURNED COMMA QUOTATION MARK
	LEFT SINGLE QUOTATION MARK
2019	% SINGLE COMMA QUOTATION MARK
	RIGHT SINGLE QUOTATION MARK
201A	% LOW SINGLE COMMA QUOTATION MARK
	SINGLE LOW-9 QUOTATION MARK
201B	% SINGLE REVERSED COMMA QUOTATION MARK
	SINGLE HIGH-REVERSED-9 QUOTATION MARK
201C	% DOUBLE TURNED COMMA QUOTATION MARK
	LEFT DOUBLE QUOTATION MARK
201D	% DOUBLE COMMA QUOTATION MARK
	RIGHT DOUBLE QUOTATION MARK
201E	% LOW DOUBLE COMMA QUOTATION MARK
	DOUBLE LOW-9 QUOTATION MARK
201F	% DOUBLE REVERSED COMMA QUOTATION MARK
	DOUBLE HIGH-REVERSED-9 QUOTATION MARK
[The awkward and ethnocentric ISO names should be hidden. 
At least, the original Unicode names would better be raised at front.]

2039	% SINGLE BACKWARD-POINTING ANGLE QUOTATION MARK
	SINGLE LEFT-POINTING ANGLE QUOTATION MARK
203A	% SINGLE FORWARD-POINTING ANGLE QUOTATION MARK
	SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

203E	% SPACING OVERSCORE
	OVERLINE

2045	% OPENING SQUARE BRACKET WITH QUILL
	LEFT SQUARE BRACKET WITH QUILL
2046	% CLOSING SQUARE BRACKET WITH QUILL
	RIGHT SQUARE BRACKET WITH QUILL

207D	% SUPERSCRIPT OPENING PARENTHESIS
	SUPERSCRIPT LEFT PARENTHESIS
207E	% SUPERSCRIPT CLOSING PARENTHESIS
	SUPERSCRIPT RIGHT PARENTHESIS
208D	% SUBSCRIPT OPENING PARENTHESIS
	SUBSCRIPT LEFT PARENTHESIS
208E	SUBSCRIPT CLOSING PARENTHESIS
	SUBSCRIPT RIGHT PARENTHESIS

[...]


20E5	% COMBINING BACKSLASH OVERLAY
	COMBINING REVERSE SOLIDUS OVERLAY

2118	% WEIERSTRASS ELLIPTIC FUNCTION 
	SCRIPT CAPITAL P

2446	% MICR TRANSIT SYMBOL
	OCR BRANCH BANK IDENTIFICATION
2447	% MICR AMOUNT SYMBOL
	OCR AMOUNT OF CHECK
[posted on Fri Apr 24, 2015]

2448	% MICR ON US SYMBOL
	OCR DASH
2449	% MICR DASH SYMBOL
	OCR CUSTOMER ACCOUNT NUMBER

3021	% SUZHOU NUMERAL ONE
	HANGZHOU NUMERAL ONE
3022	% SUZHOU NUMERAL TWO
	HANGZHOU NUMERAL TWO
3023	% SUZHOU NUMERAL THREE
	HANGZHOU NUMERAL THREE
3024	% SUZHOU NUMERAL FOUR
	HANGZHOU NUMERAL FOUR
3025	% SUZHOU NUMERAL FIVE
	HANGZHOU NUMERAL FIVE
3026	% SUZHOU NUMERAL SIX
	HANGZHOU NUMERAL SIX
3027	% SUZHOU NUMERAL SEVEN
	HANGZHOU NUMERAL SEVEN
3028	% SUZHOU NUMERAL EIGHT
	HANGZHOU NUMERAL EIGHT
3029	% SUZHOU NUMERAL NINE
	HANGZHOU NUMERAL NINE 

309F	% HIRAGANA LIGATURE YORI
	HIRAGANA DIGRAPH YORI
30FF	% KATAKANA LIGATURE KOTO
	KATAKANA DIGRAPH KOTO
[courtesy Andrew West; please see post of today above]

A015	% YI SYLLABLE ITERATION MARK 
	YI SYLLABLE WU

FE18	% PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET
	PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET

FE6B	% SMALL AT SIGN
	SMALL COMMERCIAL AT

FEFF	% BYTE ORDER MARK
	ZERO WIDTH NO-BREAK SPACE

FF20	% FULLWIDTH AT SIGN
	FULLWIDTH COMMERCIAL AT

1038D	% UGARITIC LETTER LAMBDA
	UGARITIC LETTER LAMDA

122D4	% CUNEIFORM SIGN NU11 TENU
	CUNEIFORM SIGN SHIR TENU
122D5	% CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR
	CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR

1D0C5	% BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS
	BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS

1D13A	% MUSICAL SYMBOL DOUBLE WHOLE-REST
	MUSICAL SYMBOL MULTI REST
[please refer to my related post]

1D6B2	% MATHEMATICAL BOLD CAPITAL LAMBDA
	MATHEMATICAL BOLD CAPITAL LAMDA
1D6CC	% MATHEMATICAL BOLD SMALL LAMBDA
	MATHEMATICAL BOLD SMALL LAMDA
1D6EC	% MATHEMATICAL ITALIC CAPITAL LAMBDA
	MATHEMATICAL ITALIC CAPITAL LAMDA
1D706	% MATHEMATICAL ITALIC SMALL LAMBDA
	MATHEMATICAL ITALIC SMALL LAMDA
1D726	% MATHEMATICAL BOLD ITALIC CAPITAL LAMBDA
	MATHEMATICAL BOLD ITALIC CAPITAL LAMDA
1D740	% MATHEMATICAL BOLD ITALIC SMALL LAMBDA
	MATHEMATICAL BOLD ITALIC SMALL LAMDA
1D760	% MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMBDA
	MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA
1D77A	% MATHEMATICAL SANS-SERIF BOLD SMALL LAMBDA
	MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA
1D79A	% MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMBDA
	MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA
1D7B4	% MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMBDA
	MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMDA

______________________________________________________________

Date/Time: Mon Apr 27 09:30:46 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: 1F6D0 PLACE OF WORSHIP

This character is not bidi-mirroring, like all symbols in the block, 
but this should IMO make an exception because in right-to-left script, 
the worshipping person is at back. Perhaps right-to-left scripts must use 
special fonts where all relevant symbols are mirrored. This is inconsistent 
with bidi-mirroring of mathematical symbols.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 09:32:25 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+19B0 NEW TAI LUE VOWEL SIGN VOWEL SHORTENER

The first of the New Tai Lue vowel signs that are in beta review, U+19B0, 
is a vowel shortener. Unlike U+0EB1 LAO VOWEL SIGN MAI KAN, which a 
COMMENT_LINE points out to be a vowel shortener, 
U+19B0 NEW TAI LUE VOWEL SIGN VOWEL SHORTENER seems to have no proper name 
in New Tai Lue. Therefore, I would suggest to shorten its name to 
NEW TAI LUE VOWEL SHORTENER. (There are no other instances in Unicode where 
a ‘vowel shortener’ occurs, than these two.)

Actually:
19B0	NEW TAI LUE VOWEL SIGN VOWEL SHORTENER

Suggested:
19B0	NEW TAI LUE VOWEL SHORTENER

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 09:34:24 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+11350 GRANTHA OM

Unlike all the other Grantha characters, and despite its dedicated 
“Sign” subhead, the name of U+11350 GRANTHA OM does not contain 
a class precision. Therefore I suggest to add some, after which 
the part would look this way:

[...]
1134C	GRANTHA VOWEL SIGN AU
	: 11347 11357
@		Virama
1134D	GRANTHA SIGN VIRAMA
@		Sign
11350	GRANTHA SIGN OM
@		Dependent vowel sign
11357	GRANTHA AU LENGTH MARK
@		Sign
1135D	GRANTHA SIGN PLUTA
[...]

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 09:39:02 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+218B TURNED DIGIT THREE

Cross-reference not fully “accurate”

Dear Unicode Technical Committee,

would it be possible that the users of the Standard will be given 
xrefs showing the true character names, not merely internal identifiers? 
For example, at the new turned digit three U+218B, readers find 
“→ 0190 Ɛ latin capital letter open e”. But as I already posted, 
this so-called open e should be given a formal alias showing that its true name 
is latin epsilon. However, all these formal aliases, whether they are 
eighteen or one hundred and eighty, are of little use if these characters 
stay being called by their wrong name everywhere else.

To make it clear: Unicode won’t have a single wrong name in the Standard if 
in the early nineties, there were no people who made trouble and wrote out 
acrimonious words when the Unicode Consortium had removed the wrong name 
for Æ and æ to reset it to its (original) true value. You know the issue.

I newly got hope the ISO is not any more this tyrannic standards body it 
was apparently in the nineties, and that it would not longer insist upon 
that names stability which never created but trouble and acrimony among 
users, typographers, all the working force that is in contact with the 
documents published by Unicode, taking aim at using the characters.

Today, the Unicode caracter set is indispensable, and therefore Unicode 
can improve its usefulness at use (as opposed to the usefulness at 
standardization), whether by reengineering the names stability, or by 
hollowing it out with numerous prioritized formal aliases and smart 
cross-references which give the true name, eventually preceded by 
a percent sign, or not. Given that the NamesList syntax is out of reach 
of the Stability Policy, there are many possibilities.

People hardly understand that the Unicode Standard is maintained without 
sweeping out all the wrong names once for ever.

Best regards,
Marcel Schneider

Date/Time: Mon Apr 27 09:45:20 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Language conformance

The Charts must be written in British English, a fact that is reminded 
by the five (beta-review) Sutton SignWriting characters with CENTRE 
(U+1D862, U+1D863, U+1D864, U+1DA5F, U+1DA60), 
growing the total number to 30. 

Consequently, perhaps U+1F22D SQUARED CJK UNIFIED IDEOGRAPH-4E2D should have 
its alias “center field” converted to “centre field”, and the comments at 
U+205B “this is centered on the line, but extends beyond top and bottom 
of the line” and U+A8FA “zero-advance character centered on the point 
between two orthographic syllables” ought to be corrected too? 
(The aliases for U+2385, U+1F17B, U+1F17C are IMO purposely with “center”.)

As for a personal feedback I don’t know whether to complain or not. 
Practically, the American spelling has some advantages as it is more 
widespread, for example in style sheets, but also in current language, 
while “centre” may be regarded as French or as a historic spelling. 

At least one fact is certain: “CENTRE” was not Unicode’s choice.

Best regards,
Marcel Schneider

Date/Time: Tue Apr 28 07:28:15 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Some comments on my feedback for PRI #297

Dear Unicode Technical Commettee,

I’m sorry to send you some more beta-related feedback *after* 
the deadline (the day after the deadline). It’s just a matter for me 
of not using the deadline as a pretext for not doing my job, even if 
finally the belated posts cannot be considered.

About this chronological order, Unicode could expect the beta review 
feedback would be prioritized, whereas I posted first my general concerns. 
This is because as a user I got convinced that the usefulness of the UCS 
must become centered on usage, while till now it is, in some essential 
settings, centered on standardization.

This is partly in the nature of a Standard. This part is now lesser 
because the UCS is well launched and there is no serious alternative 
any more. But partly it is a remainder of some personal external views 
in the beginnings. This part should not be taken into consideration 
because a UCS must meet a worldwide demand for true and reasonably 
widespread average identifiers (which in practice are used [not as 
descriptors but] as serious designations), and because it must be 
directly useful to end-users (who mostly read English), without relying 
only on the goodwill of overburdened implementers and developers when 
the challenge is to effectively correct misnomers and other useless and 
counterproductive complications and impolitenesses.

Now, as other Standards bodies outsourced the UCS and all management 
is centralized on the Unicode Consortium, the UCD data files and the 
UCS Code Charts can be boosted to reach at maximum reliability and 
ease of implementation.

Best wishes,

Marcel Schneider

Date/Time: Tue Apr 28 07:29:58 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Siddham section marks

The Siddham section marks U+115CA and U+115CB showing four tridents, 
a plural S might be added to the word TRIDENT in their names 
(for example SIDDHAM SECTION MARK WITH TRIDENT*S* AND DOTTED CRESCENTS), 
because the other constituents as RAYS and CIRCLES are plural too.

Another question is whether to add DOTTED in the character names. 
The dot terminating a crescent stack seems not to be distinctive, 
and its presence in the names is unconsistent 
(compare U+115CB - U+005CE with DOTTED, vs U+115D1 - U+115D4 without DOTTED). 
If shorter names are preferred, DOTTED may be removed from U+115CB - U+005CE; 
if fully descriptive names are aimed at, it can be added 
to U+115D1 - U+115D4 SIDDHAM SECTION MARK WITH *DOTTED* SEPTUPLE CRESCENTS.

In fact, as a dotted crescent looks like a Candrabindu, 
(DOTTED) CRESCENT(S) in these character names might become CANDRABINDU(S), 
if a huge majority of users would love this naming. 
The “U-shaped ornament” for its part, with the centerline inside, 
has a likeness with a trident without stick, a kind of echoed trident 
or just the trident’s prongs. This is striking because the tridents of 
U+115CA SIDDHAM SECTION MARK WITH TRIDENT AND U-SHAPED ORNAMENTS 
have U-shaped prongs, while those of 
U+115CB SIDDHAM SECTION MARK WITH TRIDENT AND DOTTED CRESCENTS 
have crescent-shaped ones.

As a result, these considerations might end up in some name change proposals 
(because the discussed characters are still in draft) as shown in the list 
below (not showing five section marks whose names I won’t suggest to modify), 
where the name after the code point is the original one, followed by one or 
several alternate names designed to meet different possible preferences.

Best regards,
Marcel Schneider
_____________________________________________________________________
115CA	SIDDHAM SECTION MARK WITH TRIDENT AND U-SHAPED ORNAMENTS
	SIDDHAM SECTION MARK WITH TRIDENTS AND U-SHAPED ORNAMENTS
	SIDDHAM SECTION MARK WITH TRIDENTS AND TRIDENT PRONGS
	SIDDHAM SECTION MARK WITH PRONGED TRIDENTS
	SIDDHAM SECTION MARK WITH PRONGED AND ECHOED TRIDENTS
	SIDDHAM SECTION MARK WITH PRONGED RAYS AND PRONGS

115CB	SIDDHAM SECTION MARK WITH TRIDENT AND DOTTED CRESCENTS
	SIDDHAM SECTION MARK WITH TRIDENTS AND DOTTED CRESCENTS
	SIDDHAM SECTION MARK WITH TRIDENTS AND CANDRABINDUS
	SIDDHAM SECTION MARK WITH CRESCENTED TRIDENTS
	SIDDHAM SECTION MARK WITH CRESCENTED AND ECHOED TRIDENTS
	SIDDHAM SECTION MARK WITH CRESCENTED RAYS AND DOTTED CRESCENTS

115CC	SIDDHAM SECTION MARK WITH RAYS AND DOTTED CRESCENTS
	SIDDHAM SECTION MARK WITH RAYS AND CANDRABINDUS
	SIDDHAM SECTION MARK WITH RAYS AND CRESCENTS

115CD	SIDDHAM SECTION MARK WITH RAYS AND DOTTED DOUBLE CRESCENTS
	SIDDHAM SECTION MARK WITH RAYS AND DOUBLE CANDRABINDUS
	SIDDHAM SECTION MARK WITH RAYS AND DOUBLE CRESCENTS

115CE	SIDDHAM SECTION MARK WITH RAYS AND DOTTED TRIPLE CRESCENTS
	SIDDHAM SECTION MARK WITH RAYS AND TRIPLE CANDRABINDUS
	SIDDHAM SECTION MARK WITH RAYS AND TRIPLE CRESCENTS

[...]

115D1	SIDDHAM SECTION MARK WITH DOUBLE CRESCENTS
	SIDDHAM SECTION MARK WITH DOUBLE CANDRABINDUS
	SIDDHAM SECTION MARK WITH DOTTED DOUBLE CRESCENTS

115D2	SIDDHAM SECTION MARK WITH TRIPLE CRESCENTS
	SIDDHAM SECTION MARK WITH TRIPLE CANDRABINDUS
	SIDDHAM SECTION MARK WITH DOTTED TRIPLE CRESCENTS

115D3	SIDDHAM SECTION MARK WITH QUADRUPLE CRESCENTS
	SIDDHAM SECTION MARK WITH QUADRUPLE CANDRABINDUS
	SIDDHAM SECTION MARK WITH DOTTED QUADRUPLE CRESCENTS

115D4	SIDDHAM SECTION MARK WITH SEPTUPLE CRESCENTS
	SIDDHAM SECTION MARK WITH SEPTUPLE CANDRABINDUS
	SIDDHAM SECTION MARK WITH DOTTED SEPTUPLE CRESCENTS

[...]
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Tue Apr 28 07:31:00 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Code Charts: reserved code points

I’d remove the default glyph icon for <reserved> code points in 
the Code Chart lists because it may disturb the layout, overlapping 
some characters like U+115B5 SIDDHAM VOWEL SIGN VOCALIC RR. 
The hatched rounded square is nice but not indispensable for understanding. 
The syntax specifies “an icon for the reserved character” must be displayed, 
but if this ends up in hiding some parts of glyphs, 
the place might be left void.

As a detail, I would suggest to choose an upwards hatch rather than downwards 
in the Code Charts, despite of downwards (upper left - lower right) being 
heraldic usage.

Best regards,
Marcel Schneider

Date/Time: Tue Apr 28 07:31:56 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Latin Extended-D and Latin Extended-E

In the COMMENT_LINE of U+A78F LATIN LETTER SINOLOGICAL DOT, 
the preposition ‘for’ occurs three times with two values. It may 
therefore be replaced with another preposition, as ‘in’ or ‘of’. 
Inspiring from the COMMENT_LINE of the preceding character 
(“used to transcribe Toda”), it would be also possible to verbalize 
the nouns, but the adjective would then convert to an adverb. 
Among the resulting options:

1	* used _in_ transliteration for Phags-Pa and phonetic transcription for Tangut
2	* used for transliteration _of_ Phags-Pa and phonetic transcription _of_ Tangut
3	* used _in_ transliteration _of_ Phags-Pa and phonetic transcription _of_ Tangut
4	* used _to transliterate_ Phags-Pa and _in_ phonetical transcription for Tangut
5	* used _to transliterate_ Phags-Pa and _for_ phonetical transcription _of_ Tangut
6	* used _to transliterate_ Phags-Pa and _in_ phonetical transcription _of_ Tangut
7	* used _to transliterate_ Phags-Pa and _phonetically transcribe_ Tangut

However, in the NamesList, another instance with matching context shows 
that the preferred form is with ‘in’ - ‘of’ (option 3 above):
0255	LATIN SMALL LETTER C WITH CURL
[...]	* used in transcription of Mandarin Chinese

In the COMMENT_LINE of U+A7B3 LATIN CAPITAL LETTER CHI, 
the space between “lower” and “case” should be deleted 
to conform to the usage in the NamesList / Code Charts.

In the NOTICE_LINE of the Historic letters for Sakha (Yakut) subhead (U+AB60), 
the final “that era” could IMO be replaced with “Sakha (Yakut)”, 
because “the [...] orthography of that era”, while literally correct, 
creates a redundance with the immediately preceding “from 1917 to 1927” 
and could therefore be used to repeat the language name rather than 
the fact that these letters are out of date, on condition that the result 
would be correct:

@+		These letters were used from 1917 to 1927 in the official IPA-based Latin orthography of _Sakha (Yakut)_.

By this occasion I wish to congratulate Unicode for having kept naming 
letters as U+AB62 LATIN SMALL LETTER OPEN OE and 
U+AB63 LATIN SMALL LETTER UO, “letters” and not “ligatures” 
as would have done an early prescriptor 
(U+0153 LATIN SMALL LIGATURE OE; 
Unicode 1.0 name: LATIN SMALL LETTER O E).

Best regards,
Marcel Schneider

Date/Time: Tue Apr 28 07:32:39 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: A cross-reference for new Combining half marks

As done for 12 other combining half-marks on a former total of 14, 
that is for all where possible, an xref should be added after 
U+FE2F COMBINING CYRILLIC TITLO RIGHT HALF. 
The complete subheader will then look like this:

@		Combining half marks
@+		These are used for supralineation in Church Slavonic texts.
FE2E	COMBINING CYRILLIC TITLO LEFT HALF
FE2F	COMBINING CYRILLIC TITLO RIGHT HALF
	x (combining cyrillic titlo - 0483)

Best regards,
Marcel Schneider

Date/Time: Tue Apr 28 12:31:42 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Bidi-mirroring of symbols and pictographs

There are many symbols and pictographs which look better when mirrored, 
in right-to-left scripts, because they show living creatures or vehicles 
from the side. In this cases the front is generally directed to the 
opposite of the writing direction, in order that the reader encounters it 
at front. In left-to-right scripts, they are “looking” from right to left. 
Their looking direction is related to semantics. If they express a movement, 
for example U+1F32C WIND BLOWING FACE, they are “looking” in the writing 
direction (for example, in left-to-right scripts, from left to right).

All these symbols need to be bidi-mirrored as well as other symbols and 
characters, as I began to underscore in a post on Mon Apr 27, 2015 (yesterday). 
For a complete rendering engine, it will be enough to be flagged for 
bidi-mirroring, to automatically mirror the glyph provided by the font, 
and such a rendering engine can therefore display every symbol mirrored, 
even if the font does not provide mirrored glyphs. This makes sense for 
all symbol glyphs that do not contain (actually latin) text and 
whose semantics does not cover directionality (as pointing hands or arrows).

For an example, I’ve extracted a list (below) of some symbols from UnicodeData, 
belonging to the blocks U+1F300-1F5FF Miscellaneous Symbols and Pictographs, 
and U+1F680-1F6FF Transport and Map Symbols, and needing IMHO to be 
bidi-mirrored (in a bidirectional context).

Best regards,
Marcel Schneider
______________________________________________________
CHAR	BIDIM	NAME
1F320	Y	SHOOTING STAR
1F324	Y	WHITE SUN WITH SMALL CLOUD
1F325	Y	WHITE SUN BEHIND CLOUD
1F326	Y	WHITE SUN BEHIND CLOUD WITH RAIN
1F327	Y	CLOUD WITH RAIN
1F328	Y	CLOUD WITH SNOW
1F329	Y	CLOUD WITH LIGHTNING
1F32A	Y	CLOUD WITH TORNADO
1F32C	Y	WIND BLOWING FACE
1F339	Y	ROSE
1F33A	Y	HIBISCUS
1F33E	Y	EAR OF RICE
1F3A0	Y	CAROUSEL HORSE
1F3B3	Y	BOWLING
1F3C0	Y	BASKETBALL AND HOOP
1F3C1	Y	CHEQUERED FLAG
1F3C2	Y	SNOWBOARDER
1F3C3	Y	RUNNER
1F3C4	Y	SURFER
1F3C7	Y	HORSE RACING
1F3CA	Y	SWIMMER
1F3CD	Y	RACING MOTORCYCLE
1F3CE	Y	RACING CAR
1F3DC	Y	DESERT
1F3DD	Y	DESERT ISLAND
1F3DE	Y	NATIONAL PARK
1F3F1	Y	WHITE PENNANT
1F3F2	Y	BLACK PENNANT
1F3F3	Y	WAVING WHITE FLAG
1F3F4	Y	WAVING BLACK FLAG
1F400	Y	RAT
1F401	Y	MOUSE
1F402	Y	OX
1F403	Y	WATER BUFFALO
1F404	Y	COW
1F405	Y	TIGER
1F406	Y	LEOPARD
1F407	Y	RABBIT
1F408	Y	CAT
1F409	Y	DRAGON
1F40A	Y	CROCODILE
1F40B	Y	WHALE
1F40C	Y	SNAIL
1F40D	Y	SNAKE
1F40E	Y	HORSE
1F40F	Y	RAM
1F410	Y	GOAT
1F411	Y	SHEEP
1F412	Y	MONKEY
1F413	Y	ROOSTER
1F414	Y	CHICKEN
1F415	Y	DOG
1F416	Y	PIG
1F417	Y	BOAR
1F418	Y	ELEPHANT
1F41A	Y	SPIRAL SHELL
1F41B	Y	BUG
1F41C	Y	ANT
1F41D	Y	HONEYBEE
1F41E	Y	LADY BEETLE
1F41F	Y	FISH
1F420	Y	TROPICAL FISH
1F421	Y	BLOWFISH
1F422	Y	TURTLE
1F424	Y	BABY CHICK
1F426	Y	BIRD
1F427	Y	PENGUIN
1F428	Y	KOALA
1F429	Y	POODLE
1F42A	Y	DROMEDARY CAMEL
1F42B	Y	BACTRIAN CAMEL
1F42C	Y	DOLPHIN
1F432	Y	DRAGON FACE
1F433	Y	SPOUTING WHALE
1F434	Y	HORSE FACE
1F43F	Y	CHIPMUNK
1F481	Y	INFORMATION DESK PERSON
1F483	Y	DANCER
1F4BA	Y	SEAT
1F4EA	Y	CLOSED MAILBOX WITH LOWERED FLAG
1F4EB	Y	CLOSED MAILBOX WITH RAISED FLAG
1F4EC	Y	OPEN MAILBOX WITH RAISED FLAG
1F4ED	Y	OPEN MAILBOX WITH LOWERED FLAG
1F4EE	Y	POSTBOX
1F4EF	Y	POSTAL HORN
1F4F0	Y	NEWSPAPER
1F4F2	Y	MOBILE PHONE WITH RIGHTWARDS ARROW AT LEFT
1F52C	Y	MICROSCOPE
1F52D	Y	TELESCOPE
1F54A	Y	DOVE OF PEACE
1F54F	Y	BOWL OF HYGIEIA
1F680	Y	ROCKET
1F681	Y	HELICOPTER
1F682	Y	STEAM LOCOMOTIVE
1F683	Y	RAILWAY CAR
1F684	Y	HIGH-SPEED TRAIN
1F685	Y	HIGH-SPEED TRAIN WITH BULLET NOSE
1F68C	Y	BUS
1F68E	Y	TROLLEYBUS
1F690	Y	MINIBUS
1F691	Y	AMBULANCE
1F692	Y	FIRE ENGINE
1F693	Y	POLICE CAR
1F695	Y	TAXI
1F697	Y	AUTOMOBILE
1F699	Y	RECREATIONAL VEHICLE
1F69A	Y	DELIVERY TRUCK
1F69B	Y	ARTICULATED LORRY
1F69C	Y	TRACTOR
1F69E	Y	MOUNTAIN RAILWAY
1F6A0	Y	MOUNTAIN CABLEWAY
1F6A1	Y	AERIAL TRAMWAY
1F6A3	Y	ROWBOAT
1F6A4	Y	SPEEDBOAT
1F6A9	Y	TRIANGULAR FLAG ON POST
1F6AE	Y	PUT LITTER IN ITS PLACE SYMBOL
1F6AF	Y	DO NOT LITTER SYMBOL
1F6B2	Y	BICYCLE
1F6B3	Y	NO BICYCLES
1F6B4	Y	BICYCLIST
1F6B5	Y	MOUNTAIN BICYCLIST
1F6B6	Y	PEDESTRIAN
1F6B7	Y	NO PEDESTRIANS
1F6B8	Y	CHILDREN CROSSING
1F6C2	Y	PASSPORT CONTROL
1F6C3	Y	CUSTOMS
1F6D0	Y	PLACE OF WORSHIP
1F6E5	Y	MOTOR BOAT
1F6E9	Y	SMALL AIRPLANE
1F6EA	Y	NORTHEAST-POINTING AIRPLANE
1F6EB	Y	AIRPLANE DEPARTURE
1F6EC	Y	AIRPLANE ARRIVING
1F6F0	Y	SATELLITE
1F6F2	Y	DIESEL LOCOMOTIVE
1F6F3	Y	PASSENGER SHIP
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Thu Apr 30 06:44:57 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297


Dear Unicode Technical Committee,

in addition ‏to my beta feedback, and to complete, I’ve got some general 
concerns again.

About harmonizing the orthograph of “lowercase”, there is to say that while 
no instances of “upper case” (with a space) are found, there are beside 
of 77 instances of “lowercase”, six of “lower case”, among which three of 
the form “lower case is CHAR”. One having already been reported, 
U+2C7E LATIN CAPITAL LETTER S WITH SWASH TAIL and 
U+2C7F LATIN CAPITAL LETTER Z WITH SWASH TAIL remain to be corrected. 
Perhaps the other (U+2121 TELEPHONE SIGN, U+213B FACSIMILE SIGN, and 
U+1F670 SCRIPT LIGATURE ET ORNAMENT) should be too, for consistency.


A formal alias is IMHO missing for U+0F0A TIBETAN MARK BKA- SHOG YIG MGO, 
given that there is only the English translation of the right name, not 
the name itself in transcribed Tibetan, as it is given in the French 
translation “ListeDesNoms-7.0(2014-06-22).txt” (courtesy http://hapax.qc.ca), 
which shows as an alias “z'ou yik gui go”; if deleting the apostrophe for 
conformance to the English NamesList syntax rules, this would end up as 
	% TIBETAN MARK ZOU YIK GUI GO


A mathematical symbol, U+29A1 SPHERICAL ANGLE OPENING UP, has 
the ‘bidi-mirrored’ property but is symetrical (to a vertical axe). 
When mathematical symbols are symetrical (as U+29D3 BLACK BOWTIE), 
they ordinarily are *not* bidi-mirrored. Therefore I suppose that 
this property should be set to “No” for U+29A1 too.


A next step could be to update UTN #27, where several misnomers are 
still missing. Even if this Technical Note does not aim at giving a 
complete overview of *all* “Known Anomalies” (as results from interpreting 
words such as “provides information on many known anomalies” and “compiled 
information on many misnamed characters”), updating it would make it 
even more helpful.

The missing anomalies to mention are AFAK:

U+047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO, 
U+047D CYRILLIC SMALL LETTER OMEGA WITH TITLO: 
The comment in the Code Chart “despite its name, this character does not 
have a titlo, nor is it composed of an omega plus a diacritic” qualifies 
these two characters for mention in UTN #27.

U+027F LATIN SMALL LETTER REVERSED R WITH FISHHOOK: 
There is just an alias followed by the parenthesized mention “(a misnomer)”. 
It therefore shoud be added on UTN #27 (and given a formal alias, see below).

U+0709 SYRIAC SUBLINEAR COLON SKEWED RIGHT: 
This misnomer has an existing formal alias, but neither is there a comment 
in the Code Chart stating “* (character) name is a misnomer”, as this 
occurs in a few character entries—with or without formal alias—, 
nor is it present in UTN #27.

The MICR (U+2446 sqq) have a mention at the end of the subheader notice: 
“The Unicode character names include several misnomers.” and should therefore 
be mentioned in the Technical Note (they were encoded 12 years ago).

U+309F HIRAGANA DIGRAPH YORI, U+30FF KATAKANA DIGRAPH KOTO: 
Andrew West states on “Unicode Character Names Part 1” that 
“These characters are ligatures of "db" and "qp" respectively, 
and not digraphs.”

U+122D4 CUNEIFORM SIGN SHIR TENU, 
U+122D5 CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR: 
The formal aliases assigned to these two characters didn’t exist 
at the time of UTN #27, nor were the characters yet encoded.

That isn’t true for U+1D13A MUSICAL SYMBOL MULTI REST, but probably 
the weak issue dissuaded from admitting it to UTN #27.



There are several things to note about the fraction characters, 
beginning with U+00BC VULGAR FRACTION ONE QUARTER, 
U+00BD VULGAR FRACTION ONE HALF, U+00BE VULGAR FRACTION THREE QUARTERS. 
First, the comment at U+00BC “other fraction characters: 2153-215E” must 
have its starting point changed to “2150” (it just hasn’t been updated 
when U+2050, U+2051 and U+2052 were encoded). Second, the recurrent comment 
“bar may be horizontal or slanted”, present in the Latin-1 block, should 
be echoed in some way for the other fraction characters. I suggest to add 
it as a first NOTICE_LINE in the “Fractions” subheader at U+2050:
@		Fractions
@+		Bar may be horizontal or slanted

This leads over to my third concern regarding fractions. 
AFAK the epithet “vulgar” applies to fractions with a slanted bar, 
as is the fraction slash (U+2044), whereas fractions with a horizontal bar 
are *not* vulgar ones. At least there is no reason to call them so, since 
this is current usage in mathematics. That “vulgar” epithet is, well, 
another mischief coming from that famous merger with the ISO/IEC 10646 draft. 
Since these character names are forwarded to end-users without being 
corrected, they must therefore be given formal aliases eliminating the 
wrong, misleading, useless and value-lowering precision:

00BC	VULGAR FRACTION ONE QUARTER
	% FRACTION ONE QUARTER
00BD	VULGAR FRACTION ONE HALF
	% FRACTION ONE HALF
00BE	VULGAR FRACTION THREE QUARTERS
	% FRACTION THREE QUARTERS
2150	VULGAR FRACTION ONE SEVENTH
	% FRACTION ONE SEVENTH
2151	VULGAR FRACTION ONE NINTH
	% FRACTION ONE NINTH
2152	VULGAR FRACTION ONE TENTH
	% FRACTION ONE TENTH
2153	VULGAR FRACTION ONE THIRD
	% FRACTION ONE THIRD
2154	VULGAR FRACTION TWO THIRDS
	% FRACTION TWO THIRDS
2155	VULGAR FRACTION ONE FIFTH
	% FRACTION ONE FIFTH
2156	VULGAR FRACTION TWO FIFTHS
	% FRACTION TWO FIFTHS
2157	VULGAR FRACTION THREE FIFTHS
	% FRACTION THREE FIFTHS
2158	VULGAR FRACTION FOUR FIFTHS
	% FRACTION FOUR FIFTHS
2159	VULGAR FRACTION ONE SIXTH
	% FRACTION ONE SIXTH
215A	VULGAR FRACTION FIVE SIXTHS
	% FRACTION FIVE SIXTHS
215B	VULGAR FRACTION ONE EIGHTH
	% FRACTION ONE EIGHTH
215C	VULGAR FRACTION THREE EIGHTHS
	% FRACTION THREE EIGHTHS
215D	VULGAR FRACTION FIVE EIGHTHS
	% FRACTION FIVE EIGHTHS
215E	VULGAR FRACTION SEVEN EIGHTHS
	% FRACTION SEVEN EIGHTHS



At last, since my proposal list for supplemental formal aliases was 
uncomplete and presented other inconveniences, it would be permitted 
to attach another, more complete one below, which above all conforms 
to the actual syntax (NAME first). This and other changes were often 
done in an automatized (spreadsheet) way.

A leading principle was that whatever characters are bidi-mirroring, 
LEFT and RIGHT qualifiers *must* be avoided in their names, because 
they grow wrong when bidi-mirroring is effective, that is, 
in right-to-left scripts. Using LEFT or RIGHT in those names despite 
of their being mismatching in a part of the contexts, is missing 
respectfulness towards a part of the users. A UCS’s identifiers must not 
be unfitting for right-to-left scripts. It is to be underscored that 
Unicode aimed at making the character names universal, and it was 
under the influence of ISO that names grew wrong and worse. 
I’ve good reasons to believe that today, ISO would never approve 
the way things were done in the nineties.

There are however some bracketing characters that do *not* mirror, 
as U+FD3E ORNATE LEFT PARENTHESIS and U+FD3F ORNATE RIGHT PARENTHESIS, 
for legacy reasons. These characters may with some reason be called 
“left/right parenthesis”, and it is even helpful to do so.


About making an extensive use of Formal Aliases, there is to note that 
this is a condition for making Formal Aliases more attractive. 
The other condition is to make them a part of UnicodeDataExtended.txt, 
a new datafile with additional fields (given that UnicodeData.txt must 
remain stable with respect to software that cannot run when supplemental 
fields are present).

One issue that inhibits names updates is the outdating of UIs which use 
local copies, for example word processors and more precisely the charmap 
and special characters dialog. Editors fear puzzling users with name changes. 
These changes will stop puzzling under the effect of good communication. 
The move towards more truth in character names is even likely to deliver 
an excellent marketing argument, and the overall image of the brand will be 
strengthened. Software providers must endorse a cultural responsibility 
and avoid messing with linguistical legacy. If this job is done, the 
formal aliases I suggest, are already present in the UIs, and adding them 
to Unicode will simply consecrate this work. This will result in updating 
Unicode and re-establishing a reasonable synching between presumably current 
character names and UCD-based information sources. By contrast, if this work 
isn’t already done on developers’ side, the opportunity to do it has perhaps 
come. It can be performed thanks to a huge bulk of “new-old” Formal Aliases.

Experience seems to prove that for a UCS, the usefulness *at standardization* 
must be distinguished from the usefulness *at use*. I mean that a Standard 
useful at standardization is not necessarily useful at use. Reliability is 
a main criterium for usefulness at use, and this reliability is propped by 
character names’ accuracy, *not* stability delusion. Publishing a complicated 
Standard awaiting to be translated, even to English, is not to realize the 
potential of the process. Reality shows that people who do the job Unicode 
left undone, are too few and are likely to disappear, spending their time and 
working force for other productions.


Fundamentally, this change, how ever sweeping it might be, and regardless of 
the duration of former practice, is in conformance with Microsoft’s new 
Corporate Policy: 
	“[...] our industry does not respect tradition – it only respects 
	innovation. [...] I consider the job before us to be bolder and 
	more ambitious than anything we have ever done. [...] 	
	Our customers and society expect us to maximize the value of technology 
	while also preserving the values that are timeless.” 
Microsoft’s CEO Mr Satya Nadella wrote to All Employees on July 10, 2014 
(http://bit.ly/1wRIBqD). 

The job that might be on stage at Unicode now, would be to maximize 
the value of the Unicode documentation by making it directly useful 
to users, shortening the way from standardization to use by eliminating 
the step of translating to English and by making the Standard conform 
to the timeless cultural settings Unicode was respectful of in its 1.0 version.


Best regards,
Marcel Schneider
_________________________________________________________________
Suggested Formal Aliases, including the existing ones (numbered)

0028	LEFT PARENTHESIS
	% OPENING PARENTHESIS
0029	RIGHT PARENTHESIS
	% CLOSING PARENTHESIS
	
002E	FULL STOP
	% PERIOD
	
002F	SOLIDUS
	% SLASH
	
0040	COMMERCIAL AT
	% AT SIGN
	
005B	LEFT SQUARE BRACKET
	% OPENING SQUARE BRACKET
005C	REVERSE SOLIDUS
	% BACKSLASH
005D	RIGHT SQUARE BRACKET
	% CLOSING SQUARE BRACKET
	
005E	CIRCUMFLEX ACCENT
	% SPACING CIRCUMFLEX
005F	LOW LINE
	% SPACING UNDERSCORE
0060	GRAVE ACCENT
	% SPACING GRAVE
	
007B	LEFT CURLY BRACKET
	% OPENING CURLY BRACKET
007D	RIGHT CURLY BRACKET
	% CLOSING CURLY BRACKET
	
00A1	INVERTED EXCLAMATION MARK
	% TURNED EXCLAMATION MARK
	
00AB	LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
	% BACKWARDS-POINTING DOUBLE ANGLE QUOTATION MARK
	
00B4	ACUTE ACCENT
	% SPACING ACUTE
	
00B8	CEDILLA
	% SPACING CEDILLA
	
00BB	RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
	 % FORWARDS-POINTING DOUBLE ANGLE QUOTATION MARK
00BC	VULGAR FRACTION ONE QUARTER
	% FRACTION ONE QUARTER
00BD	VULGAR FRACTION ONE HALF
	% FRACTION ONE HALF
00BE	VULGAR FRACTION THREE QUARTERS
	% FRACTION THREE QUARTERS
00BF	INVERTED QUESTION MARK
	% TURNED QUESTION MARK
	
00DF	LATIN SMALL LETTER SHARP S
	% LATIN SMALL LETTER SZ
	
010C	LATIN CAPITAL LETTER C WITH CARON
	% LATIN CAPITAL LETTER C WITH HACEK
010D	LATIN SMALL LETTER C WITH CARON
	% LATIN SMALL LETTER C WITH HACEK
	
010E	LATIN CAPITAL LETTER D WITH CARON
	% LATIN CAPITAL LETTER D WITH HACEK
010F	LATIN SMALL LETTER D WITH CARON
	% LATIN SMALL LETTER D WITH HACEK
	
011A	LATIN CAPITAL LETTER E WITH CARON
	% LATIN CAPITAL LETTER E WITH HACEK
011B	LATIN SMALL LETTER E WITH CARON
	% LATIN SMALL LETTER E WITH HACEK
	
0132	LATIN CAPITAL LIGATURE IJ
	% LATIN CAPITAL LETTER IJ
0133	LATIN SMALL LIGATURE IJ
	% LATIN SMALL LETTER IJ
	
013D	LATIN CAPITAL LETTER L WITH CARON
	% LATIN CAPITAL LETTER L WITH HACEK
013E	LATIN SMALL LETTER L WITH CARON
	% LATIN SMALL LETTER L WITH HACEK
	
0147	LATIN CAPITAL LETTER N WITH CARON
	% LATIN CAPITAL LETTER N WITH HACEK
0148	LATIN SMALL LETTER N WITH CARON
	% LATIN SMALL LETTER N WITH HACEK
	
0152	LATIN CAPITAL LIGATURE OE
	% LATIN CAPITAL LETTER OE
0153	LATIN SMALL LIGATURE OE
	% LATIN SMALL LETTER OE
	
0158	LATIN CAPITAL LETTER R WITH CARON
	% LATIN CAPITAL LETTER R WITH HACEK
0159	LATIN SMALL LETTER R WITH CARON
	% LATIN SMALL LETTER R WITH HACEK
0160	LATIN CAPITAL LETTER S WITH CARON
	% LATIN CAPITAL LETTER S WITH HACEK
0161	LATIN SMALL LETTER S WITH CARON
	% LATIN SMALL LETTER S WITH HACEK
	
0164	LATIN CAPITAL LETTER T WITH CARON
	% LATIN CAPITAL LETTER T WITH HACEK
0165	LATIN SMALL LETTER T WITH CARON
	% LATIN SMALL LETTER T WITH HACEK
	
017D	LATIN CAPITAL LETTER Z WITH CARON
	% LATIN CAPITAL LETTER Z WITH HACEK
017E	LATIN SMALL LETTER Z WITH CARON
	% LATIN SMALL LETTER Z WITH HACEK
	
0190	LATIN CAPITAL LETTER OPEN E
	% LATIN CAPITAL LETTER EPSILON
	
01A2	LATIN CAPITAL LETTER OI
1	% LATIN CAPITAL LETTER GHA
01A3	LATIN SMALL LETTER OI
2	% LATIN SMALL LETTER GHA
	
01BE	LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE
	% LATIN STACKED LIGATURE TS [???]
	
01C4	LATIN CAPITAL LETTER DZ WITH CARON
	% LATIN CAPITAL LETTER DZ WITH HACEK
01C5	LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
	% LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH HACEK
01C6	LATIN SMALL LETTER DZ WITH CARON
	% LATIN SMALL LETTER DZ WITH HACEK

01CE	LATIN SMALL LETTER A WITH CARON
	% LATIN SMALL LETTER A WITH HACEK
01CF	LATIN CAPITAL LETTER I WITH CARON
	% LATIN CAPITAL LETTER I WITH HACEK
01D0	LATIN SMALL LETTER I WITH CARON
	% LATIN SMALL LETTER I WITH HACEK
01D1	LATIN CAPITAL LETTER O WITH CARON
	% LATIN CAPITAL LETTER O WITH HACEK
01D2	LATIN SMALL LETTER O WITH CARON
	% LATIN SMALL LETTER O WITH HACEK
01D3	LATIN CAPITAL LETTER U WITH CARON
	% LATIN CAPITAL LETTER U WITH HACEK
01D4	LATIN SMALL LETTER U WITH CARON
	% LATIN SMALL LETTER U WITH HACEK
01D9	LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON
	% LATIN CAPITAL LETTER U WITH DIAERESIS AND HACEK
01DA	LATIN SMALL LETTER U WITH DIAERESIS AND CARON
	% LATIN SMALL LETTER U WITH DIAERESIS AND HACEK

01E6	LATIN CAPITAL LETTER G WITH CARON
	% LATIN CAPITAL LETTER G WITH HACEK
01E7	LATIN SMALL LETTER G WITH CARON
	% LATIN SMALL LETTER G WITH HACEK
01E8	LATIN CAPITAL LETTER K WITH CARON
	% LATIN CAPITAL LETTER K WITH HACEK
01E9	LATIN SMALL LETTER K WITH CARON
	% LATIN SMALL LETTER K WITH HACEK
	
01EE	LATIN CAPITAL LETTER EZH WITH CARON
	% LATIN CAPITAL LETTER EZH WITH HACEK
01EF	LATIN SMALL LETTER EZH WITH CARON
	% LATIN SMALL LETTER EZH WITH HACEK
01F0	LATIN SMALL LETTER J WITH CARON
	% LATIN SMALL LETTER J WITH HACEK
	
021E	LATIN CAPITAL LETTER H WITH CARON
	% LATIN CAPITAL LETTER H WITH HACEK
021F	LATIN SMALL LETTER H WITH CARON
	% LATIN SMALL LETTER H WITH HACEK
	
0238	LATIN SMALL LETTER DB DIGRAPH
	% LATIN SMALL LIGATURE DB
0239	LATIN SMALL LETTER QP DIGRAPH
	% LATIN SMALL LIGATURE QP
	
025B	LATIN SMALL LETTER OPEN E
	% LATIN SMALL LETTER EPSILON
025E	LATIN SMALL LETTER CLOSED REVERSED OPEN E
	% LATIN SMALL LETTER CLOSED REVERSED EPSILON
	
027F	LATIN SMALL LETTER REVERSED R WITH FISHHOOK
	% LATIN SMALL LETTER LONG LEG TURNED IOTA

0285	LATIN SMALL LETTER SQUAT REVERSED ESH
	% LATIN SMALL LETTER REVERSED R WITH FISHHOOK AND RETROFLEX HOOK
	
02C7	CARON
	% MODIFIER LETTER HACEK
	
030C	COMBINING CARON
	% COMBINING HACEK
032C	COMBINING CARON BELOW
	% COMBINING HACEK BELOW
	
039B	GREEK CAPITAL LETTER LAMDA
	% GREEK CAPITAL LETTER LAMBDA
03BB	GREEK SMALL LETTER LAMDA
	% GREEK SMALL LETTER LAMBDA
	
047C	CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
	% CYRILLIC CAPITAL LETTER BEAUTIFUL OMEGA
047D	CYRILLIC SMALL LETTER OMEGA WITH TITLO
	% CYRILLIC SMALL LETTER BEAUTIFUL OMEGA
	
0598	HEBREW ACCENT ZARQA
	% HEBREW ACCENT TSINORIT
05AE	HEBREW ACCENT ZINOR
	% HEBREW ACCENT TSINOR
	
0670	ARABIC LETTER SUPERSCRIPT ALEF
	% ARABIC VOWEL SIGN SUPERSCRIPT ALEF
	
06C0	ARABIC LETTER HEH WITH YEH ABOVE
	% ARABIC LIGATURE HEH WITH YEH ABOVE
06C2	ARABIC LETTER HEH GOAL WITH HAMZA ABOVE
	% ARABIC LIGATURE HEH GOAL WITH HAMZA ABOVE
06D3	ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
	% ARABIC LIGATURE YEH BARREE WITH HAMZA ABOVE
	
0709	SYRIAC SUBLINEAR COLON SKEWED RIGHT
3	% SYRIAC SUBLINEAR COLON SKEWED LEFT
	
0A01	GURMUKHI SIGN ADAK BINDI
	% GURMUKHI SIGN ADDAK BINDI
	
0B83	TAMIL SIGN VISARGA
	% TAMIL SIGN AYTHAM
	
0CDE	KANNADA LETTER FA
4	% KANNADA LETTER LLLA
	
0E9D	LAO LETTER FO TAM
5	% LAO LETTER FO FON
0E9F	LAO LETTER FO SUNG
6	% LAO LETTER FO FAY
0EA3	LAO LETTER LO LING 
7	% LAO LETTER RO
0EA5	LAO LETTER LO LOOT
8	% LAO LETTER LO
	
0F0A	TIBETAN MARK BKA- SHOG YIG MGO
	% TIBETAN MARK ZOU YIK GUI GO
	
0F0B	TIBETAN MARK INTERSYLLABIC TSHEG
	% TIBETAN MARK TSHEG
0F0C	TIBETAN MARK DELIMITER TSHEG BSTAR
	TIBETAN MARK NO-BREAK TSHEG [???]
0FD0	TIBETAN MARK BSKA- SHOG GI MGO RGYAN
9	% TIBETAN MARK BKA- SHOG GI MGO RGYAN
	
156F	CANADIAN SYLLABICS TTH
	% CANADIAN SYLLABICS ASTERISK
	
178E	KHMER LETTER NNO
	% KHMER LETTER NNA
179E	KHMER LETTER SSO
	% KHMER LETTER SSA
	
1D27	GREEK LETTER SMALL CAPITAL LAMDA
	% GREEK LETTER SMALL CAPITAL LAMBDA
	
1E9E	LATIN CAPITAL LETTER SHARP S
	% LATIN CAPITAL LETTER SZ
	
2018	LEFT SINGLE QUOTATION MARK
	% SINGLE TURNED COMMA QUOTATION MARK
2019	RIGHT SINGLE QUOTATION MARK
	% SINGLE COMMA QUOTATION MARK
201A	SINGLE LOW-9 QUOTATION MARK
	% LOW SINGLE COMMA QUOTATION MARK
201B	SINGLE HIGH-REVERSED-9 QUOTATION MARK
	% SINGLE REVERSED COMMA QUOTATION MARK
201C	LEFT DOUBLE QUOTATION MARK
	% DOUBLE TURNED COMMA QUOTATION MARK
201D	RIGHT DOUBLE QUOTATION MARK
	% DOUBLE COMMA QUOTATION MARK
201E	DOUBLE LOW-9 QUOTATION MARK
	% LOW DOUBLE COMMA QUOTATION MARK
201F	DOUBLE HIGH-REVERSED-9 QUOTATION MARK
	% DOUBLE REVERSED COMMA QUOTATION MARK
	
2039	SINGLE LEFT-POINTING ANGLE QUOTATION MARK
	% SINGLE BACKWARDS-POINTING ANGLE QUOTATION MARK
203A	SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
	% SINGLE FORWARDS-POINTING ANGLE QUOTATION MARK
	
203E	OVERLINE
	% SPACING OVERSCORE
	
2045	LEFT SQUARE BRACKET WITH QUILL
	% OPENING SQUARE BRACKET WITH QUILL
2046	RIGHT SQUARE BRACKET WITH QUILL
	% CLOSING SQUARE BRACKET WITH QUILL

207D	SUPERSCRIPT LEFT PARENTHESIS
	% SUPERSCRIPT OPENING PARENTHESIS
207E	SUPERSCRIPT RIGHT PARENTHESIS
	% SUPERSCRIPT CLOSING PARENTHESIS
208D	SUBSCRIPT LEFT PARENTHESIS
	% SUBSCRIPT OPENING PARENTHESIS
208E	SUBSCRIPT RIGHT PARENTHESIS
	SUBSCRIPT CLOSING PARENTHESIS
	
20E5	COMBINING REVERSE SOLIDUS OVERLAY
	% COMBINING BACKSLASH OVERLAY
	
2113	SCRIPT SMALL L
	% MATHEMATICAL SYMBOL ELL

2118	SCRIPT CAPITAL P
10	% WEIERSTRASS ELLIPTIC FUNCTION 
	
2150	VULGAR FRACTION ONE SEVENTH
	% FRACTION ONE SEVENTH
2151	VULGAR FRACTION ONE NINTH
	% FRACTION ONE NINTH
2152	VULGAR FRACTION ONE TENTH
	% FRACTION ONE TENTH
2153	VULGAR FRACTION ONE THIRD
	% FRACTION ONE THIRD
2154	VULGAR FRACTION TWO THIRDS
	% FRACTION TWO THIRDS
2155	VULGAR FRACTION ONE FIFTH
	% FRACTION ONE FIFTH
2156	VULGAR FRACTION TWO FIFTHS
	% FRACTION TWO FIFTHS
2157	VULGAR FRACTION THREE FIFTHS
	% FRACTION THREE FIFTHS
2158	VULGAR FRACTION FOUR FIFTHS
	% FRACTION FOUR FIFTHS
2159	VULGAR FRACTION ONE SIXTH
	% FRACTION ONE SIXTH
215A	VULGAR FRACTION FIVE SIXTHS
	% FRACTION FIVE SIXTHS
215B	VULGAR FRACTION ONE EIGHTH
	% FRACTION ONE EIGHTH
215C	VULGAR FRACTION THREE EIGHTHS
	% FRACTION THREE EIGHTHS
215D	VULGAR FRACTION FIVE EIGHTHS
	% FRACTION FIVE EIGHTHS
215E	VULGAR FRACTION SEVEN EIGHTHS
	% FRACTION SEVEN EIGHTHS
	
22A2	RIGHT TACK
	% FORWARDS TACK
22A3	LEFT TACK
	% BACKWARDS TACK

22C9	LEFT NORMAL FACTOR SEMIDIRECT PRODUCT
	% BACKWARDS NORMAL FACTOR SEMIDIRECT PRODUCT
22CA	RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT
	% FORWARDS NORMAL FACTOR SEMIDIRECT PRODUCT
22CB	LEFT SEMIDIRECT PRODUCT
	% BACKWARDS SEMIDIRECT PRODUCT
22CC	RIGHT SEMIDIRECT PRODUCT
	% FORWARDS SEMIDIRECT PRODUCT

2308	LEFT CEILING
	% BEGIN CEILING
2309	RIGHT CEILING
	% END CEILING
230A	LEFT FLOOR
	% BEGIN FLOOR
230B	RIGHT FLOOR
	% END FLOOR

2329	LEFT-POINTING ANGLE BRACKET
	% BACKWARDS-POINTING ANGLE BRACKET
232A	RIGHT-POINTING ANGLE BRACKET
	% FORWARDS-POINTING ANGLE BRACKET

232B	ERASE TO THE LEFT
	% ERASE BACKWARDS

2446	OCR BRANCH BANK IDENTIFICATION
	% MICR TRANSIT SYMBOL
2447	OCR AMOUNT OF CHECK
	% MICR AMOUNT SYMBOL
	
2448	OCR DASH
11	% MICR ON US SYMBOL
2449	OCR CUSTOMER ACCOUNT NUMBER
12	% MICR DASH SYMBOL

2768	MEDIUM LEFT PARENTHESIS ORNAMENT
	% MEDIUM OPENING PARENTHESIS ORNAMENT
2769	MEDIUM RIGHT PARENTHESIS ORNAMENT
	% MEDIUM CLOSING PARENTHESIS ORNAMENT
276A	MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
	% MEDIUM FLATTENED OPENING PARENTHESIS ORNAMENT
276B	MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
	% MEDIUM FLATTENED CLOSING PARENTHESIS ORNAMENT
276C	MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
	% MEDIUM OPENING-POINTING ANGLE BRACKET ORNAMENT
276D	MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
	% MEDIUM CLOSING-POINTING ANGLE BRACKET ORNAMENT
276E	HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
	% HEAVY BACKWARDS-POINTING ANGLE QUOTATION MARK ORNAMENT
276F	HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
	% HEAVY FORWARDS-POINTING ANGLE QUOTATION MARK ORNAMENT
2770	HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
	% HEAVY BACKWARDS-POINTING ANGLE BRACKET ORNAMENT
2771	HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
	% HEAVY FORWARDS-POINTING ANGLE BRACKET ORNAMENT
2772	LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
	% LIGHT OPENING TORTOISE SHELL BRACKET ORNAMENT
2773	LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
	% LIGHT CLOSING TORTOISE SHELL BRACKET ORNAMENT
2774	MEDIUM LEFT CURLY BRACKET ORNAMENT
	% MEDIUM OPENING CURLY BRACKET ORNAMENT
2775	MEDIUM RIGHT CURLY BRACKET ORNAMENT
	% MEDIUM CLOSING CURLY BRACKET ORNAMENT

27C5	LEFT S-SHAPED BAG DELIMITER
	% OPENING S-SHAPED BAG DELIMITER
27C6	RIGHT S-SHAPED BAG DELIMITER
	% CLOSING S-SHAPED BAG DELIMITER

27C8	REVERSE SOLIDUS PRECEDING SUBSET
	% BACKSLASH PRECEDING SUBSET
27C9	SUPERSET PRECEDING SOLIDUS
	% SUPERSET PRECEDING SLASH

27D3	LOWER RIGHT CORNER WITH DOT
	% LOWER CORNER WITH DOT
27D4	UPPER LEFT CORNER WITH DOT
	% UPPER CORNER WITH DOT
27D5	LEFT OUTER JOIN
	% BACKWARDS OUTER JOIN
27D6	RIGHT OUTER JOIN
	% FORWARDS OUTER JOIN
27DC	LEFT MULTIMAP
	% BACKWARDS MULTIMAP
27DD	LONG RIGHT TACK
	% LONG FORWARDS TACK
27DE	LONG LEFT TACK
	% LONG BACKWARDS TACK
27E2	WHITE CONCAVE-SIDED DIAMOND WITH LEFTWARDS TICK
	% WHITE CONCAVE-SIDED DIAMOND WITH BACKWARDS TICK
27E3	WHITE CONCAVE-SIDED DIAMOND WITH RIGHTWARDS TICK
	% WHITE CONCAVE-SIDED DIAMOND WITH FORWARDS TICK
27E4	WHITE SQUARE WITH LEFTWARDS TICK
	% WHITE SQUARE WITH BACKWARDS TICK
27E5	WHITE SQUARE WITH RIGHTWARDS TICK
	% WHITE SQUARE WITH FORWARDS TICK
27E6	MATHEMATICAL LEFT WHITE SQUARE BRACKET
	% MATHEMATICAL OPENING WHITE SQUARE BRACKET
27E7	MATHEMATICAL RIGHT WHITE SQUARE BRACKET
	% MATHEMATICAL CLOSING WHITE SQUARE BRACKET
27E8	MATHEMATICAL LEFT ANGLE BRACKET
	% MATHEMATICAL OPENING ANGLE BRACKET
27E9	MATHEMATICAL RIGHT ANGLE BRACKET
	% MATHEMATICAL CLOSING ANGLE BRACKET
27EA	MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
	% MATHEMATICAL OPENING DOUBLE ANGLE BRACKET
27EB	MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
	% MATHEMATICAL CLOSING DOUBLE ANGLE BRACKET
27EC	MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
	% MATHEMATICAL OPENING WHITE TORTOISE SHELL BRACKET
27ED	MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET
	% MATHEMATICAL CLOSING WHITE TORTOISE SHELL BRACKET
27EE	MATHEMATICAL LEFT FLATTENED PARENTHESIS
	% MATHEMATICAL OPENING FLATTENED PARENTHESIS
27EF	MATHEMATICAL RIGHT FLATTENED PARENTHESIS
	% MATHEMATICAL CLOSING FLATTENED PARENTHESIS

2983	LEFT WHITE CURLY BRACKET
	% OPENING WHITE CURLY BRACKET
2984	RIGHT WHITE CURLY BRACKET
	% CLOSING WHITE CURLY BRACKET
2985	LEFT WHITE PARENTHESIS
	% OPENING WHITE PARENTHESIS
2986	RIGHT WHITE PARENTHESIS
	% CLOSING WHITE PARENTHESIS
2987	Z NOTATION LEFT IMAGE BRACKET
	% Z NOTATION OPENING IMAGE BRACKET
2988	Z NOTATION RIGHT IMAGE BRACKET
	% Z NOTATION CLOSING IMAGE BRACKET
2989	Z NOTATION LEFT BINDING BRACKET
	% Z NOTATION OPENING BINDING BRACKET
298A	Z NOTATION RIGHT BINDING BRACKET
	% Z NOTATION CLOSING BINDING BRACKET
298B	LEFT SQUARE BRACKET WITH UNDERBAR
	% OPENING SQUARE BRACKET WITH UNDERBAR
298C	RIGHT SQUARE BRACKET WITH UNDERBAR
	% CLOSING SQUARE BRACKET WITH UNDERBAR
298D	LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
	% OPENING SQUARE BRACKET WITH TICK IN TOP CORNER
298E	RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
	% CLOSING SQUARE BRACKET WITH TICK IN BOTTOM CORNER
298F	LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
	% OPENING SQUARE BRACKET WITH TICK IN BOTTOM CORNER
2990	RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
	% CLOSING SQUARE BRACKET WITH TICK IN TOP CORNER
2991	LEFT ANGLE BRACKET WITH DOT
	% OPENING ANGLE BRACKET WITH DOT
2992	RIGHT ANGLE BRACKET WITH DOT
	% CLOSING ANGLE BRACKET WITH DOT
2993	LEFT ARC LESS-THAN BRACKET
	% OPENING ARC LESS-THAN BRACKET
2994	RIGHT ARC GREATER-THAN BRACKET
	% CLOSING ARC GREATER-THAN BRACKET
2995	DOUBLE LEFT ARC GREATER-THAN BRACKET
	% DOUBLE OPENING ARC GREATER-THAN BRACKET
2996	DOUBLE RIGHT ARC LESS-THAN BRACKET
	% DOUBLE CLOSING ARC LESS-THAN BRACKET
2997	LEFT BLACK TORTOISE SHELL BRACKET
	% OPENING BLACK TORTOISE SHELL BRACKET
2998	RIGHT BLACK TORTOISE SHELL BRACKET
	% CLOSING BLACK TORTOISE SHELL BRACKET

299B	MEASURED ANGLE OPENING LEFT
	% MEASURED ANGLE OPENING BACKWARDS
29A0	SPHERICAL ANGLE OPENING LEFT
	% SPHERICAL ANGLE OPENING BACKWARDS
29A8	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND RIGHT
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND FORWARDS
29A9	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND LEFT
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING UP AND BACKWARDS
29AA	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND RIGHT
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND FORWARDS
29AB	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND LEFT
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING DOWN AND BACKWARDS
29AC	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND UP
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING FORWARDS AND UP
29AD	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND UP
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING BACKWARDS AND UP
29AE	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING RIGHT AND DOWN
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING FORWARDS AND DOWN
29AF	MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING LEFT AND DOWN
	% MEASURED ANGLE WITH OPEN ARM ENDING IN ARROW POINTING BACKWARDS AND DOWN

29B8	CIRCLED REVERSE SOLIDUS
	% CIRCLED BACKSLASH

29C2	CIRCLE WITH SMALL CIRCLE TO THE RIGHT
	% CIRCLE WITH SMALL CIRCLE AFTER
29C3	CIRCLE WITH TWO HORIZONTAL STROKES TO THE RIGHT
	% CIRCLE WITH TWO HORIZONTAL STROKES AFTER

29CE	RIGHT TRIANGLE ABOVE LEFT TRIANGLE
	% FORWARDS TRIANGLE ABOVE BACKWARDS TRIANGLE
29CF	LEFT TRIANGLE BESIDE VERTICAL BAR
	BACKWARDS TRIANGLE BESIDE VERTICAL BAR

29D0	VERTICAL BAR BESIDE RIGHT TRIANGLE
	% VERTICAL BAR BESIDE FORWARDS TRIANGLE
29D1	BOWTIE WITH LEFT HALF BLACK
	% BOWTIE WITH BACKWARDS HALF BLACK
29D2	BOWTIE WITH RIGHT HALF BLACK
	% BOWTIE WITH FORWARDS HALF BLACK
29D4	TIMES WITH LEFT HALF BLACK
	% TIMES WITH BACKWARDS HALF BLACK
29D5	TIMES WITH RIGHT HALF BLACK
	% TIMES WITH FORWARDS HALF BLACK
29D8	LEFT WIGGLY FENCE
	% BACKWARDS WIGGLY FENCE
29D9	RIGHT WIGGLY FENCE
	% FORWARDS WIGGLY FENCE
29DA	LEFT DOUBLE WIGGLY FENCE
	% BACKWARDS DOUBLE WIGGLY FENCE
29DB	RIGHT DOUBLE WIGGLY FENCE
	% FORWARDS DOUBLE WIGGLY FENCE
29E8	DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK
	% DOWN-POINTING TRIANGLE WITH BACKWARDS HALF BLACK
29E9	DOWN-POINTING TRIANGLE WITH RIGHT HALF BLACK
	% DOWN-POINTING TRIANGLE WITH FORWARDS HALF BLACK
29F5	REVERSE SOLIDUS OPERATOR
	% BACKSLASH OPERATOR
29F6	SOLIDUS WITH OVERBAR
	% SLASH WITH OVERBAR
29F7	REVERSE SOLIDUS WITH HORIZONTAL STROKE
	% BACKSLASH WITH HORIZONTAL STROKE
29F8	BIG SOLIDUS
	% BIG SLASH
29F9	BIG REVERSE SOLIDUS
	% BIG BACKSLASH
29FC	LEFT-POINTING CURVED ANGLE BRACKET
	% BACKWARDS-POINTING CURVED ANGLE BRACKET
29FD	RIGHT-POINTING CURVED ANGLE BRACKET
	% FORWARDS-POINTING CURVED ANGLE BRACKET

2A1E	LARGE LEFT TRIANGLE OPERATOR
	% LARGE BACKWARDS TRIANGLE OPERATOR

2A2D	PLUS SIGN IN LEFT HALF CIRCLE
	% PLUS SIGN IN BACKWARDS HALF CIRCLE
2A2E	PLUS SIGN IN RIGHT HALF CIRCLE
	% PLUS SIGN IN FORWARDS HALF CIRCLE
2A34	MULTIPLICATION SIGN IN LEFT HALF CIRCLE
	% MULTIPLICATION SIGN IN BACKWARDS HALF CIRCLE
2A35	MULTIPLICATION SIGN IN RIGHT HALF CIRCLE
	% MULTIPLICATION SIGN IN FORWARDS HALF CIRCLE
2A83	LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE RIGHT
	% LESS-THAN OR SLANTED EQUAL TO WITH DOT ON TOP
2A84	GREATER-THAN OR SLANTED EQUAL TO WITH DOT ABOVE LEFT
	% GREATER-THAN OR SLANTED EQUAL TO WITH DOT ON TOP
2ACD	SQUARE LEFT OPEN BOX OPERATOR
	% SQUARE BACKWARDS OPEN BOX OPERATOR
2ACE	SQUARE RIGHT OPEN BOX OPERATOR
	% SQUARE FORWARDS OPEN BOX OPERATOR
2ADE	SHORT LEFT TACK
	% SHORT BACKWARDS TACK
2AE2	VERTICAL BAR TRIPLE RIGHT TURNSTILE
	% VERTICAL BAR TRIPLE FORWARDS TURNSTILE
2AE3	DOUBLE VERTICAL BAR LEFT TURNSTILE
	% DOUBLE VERTICAL BAR BACKWARDS TURNSTILE
2AE4	VERTICAL BAR DOUBLE LEFT TURNSTILE
	% VERTICAL BAR DOUBLE BACKWARDS TURNSTILE
2AE5	DOUBLE VERTICAL BAR DOUBLE LEFT TURNSTILE
	% DOUBLE VERTICAL BAR DOUBLE BACKWARDS TURNSTILE
2AE6	LONG DASH FROM LEFT MEMBER OF DOUBLE VERTICAL
	% LONG DASH FROM BACKWARDS MEMBER OF DOUBLE VERTICAL
2E02	LEFT SUBSTITUTION BRACKET
	% OPENING SUBSTITUTION BRACKET
2E03	RIGHT SUBSTITUTION BRACKET
	% CLOSING SUBSTITUTION BRACKET
2E04	LEFT DOTTED SUBSTITUTION BRACKET
	% OPENING DOTTED SUBSTITUTION BRACKET
2E05	RIGHT DOTTED SUBSTITUTION BRACKET
	% CLOSING DOTTED SUBSTITUTION BRACKET
2E09	LEFT TRANSPOSITION BRACKET
	% OPENING TRANSPOSITION BRACKET
2E0A	RIGHT TRANSPOSITION BRACKET
	% CLOSING TRANSPOSITION BRACKET
2E0C	LEFT RAISED OMISSION BRACKET
	% OPENING RAISED OMISSION BRACKET
2E0D	RIGHT RAISED OMISSION BRACKET
	% CLOSING RAISED OMISSION BRACKET
2E1C	LEFT LOW PARAPHRASE BRACKET
	% OPENING LOW PARAPHRASE BRACKET
2E1D	RIGHT LOW PARAPHRASE BRACKET
	% CLOSING LOW PARAPHRASE BRACKET
2E20	LEFT VERTICAL BAR WITH QUILL
	% OPENING VERTICAL BAR WITH QUILL
2E21	RIGHT VERTICAL BAR WITH QUILL
	% CLOSING VERTICAL BAR WITH QUILL
2E22	TOP LEFT HALF BRACKET
	% TOP OPENING HALF BRACKET
2E23	TOP RIGHT HALF BRACKET
	% TOP CLOSING HALF BRACKET
2E24	BOTTOM LEFT HALF BRACKET
	% BOTTOM OPENING HALF BRACKET
2E25	BOTTOM RIGHT HALF BRACKET
	% BOTTOM CLOSING HALF BRACKET
2E26	LEFT SIDEWAYS U BRACKET
	% OPENING SIDEWAYS U BRACKET
2E27	RIGHT SIDEWAYS U BRACKET
	% CLOSING SIDEWAYS U BRACKET
2E28	LEFT DOUBLE PARENTHESIS
	% OPENING DOUBLE PARENTHESIS
2E29	RIGHT DOUBLE PARENTHESIS
	% CLOSING DOUBLE PARENTHESIS

3008	LEFT ANGLE BRACKET
	% OPENING ANGLE BRACKET
3009	RIGHT ANGLE BRACKET
	% CLOSING ANGLE BRACKET
300A	LEFT DOUBLE ANGLE BRACKET
	% OPENING DOUBLE ANGLE BRACKET
300B	RIGHT DOUBLE ANGLE BRACKET
	% CLOSING DOUBLE ANGLE BRACKET
300C	LEFT CORNER BRACKET
	% OPENING CORNER BRACKET
300D	RIGHT CORNER BRACKET
	% CLOSING CORNER BRACKET
300E	LEFT WHITE CORNER BRACKET
	% OPENING WHITE CORNER BRACKET
300F	RIGHT WHITE CORNER BRACKET
	% CLOSING WHITE CORNER BRACKET
3010	LEFT BLACK LENTICULAR BRACKET
	% OPENING BLACK LENTICULAR BRACKET
3011	RIGHT BLACK LENTICULAR BRACKET
	% CLOSING BLACK LENTICULAR BRACKET
3014	LEFT TORTOISE SHELL BRACKET
	% OPENING TORTOISE SHELL BRACKET
3015	RIGHT TORTOISE SHELL BRACKET
	% CLOSING TORTOISE SHELL BRACKET
3016	LEFT WHITE LENTICULAR BRACKET
	% OPENING WHITE LENTICULAR BRACKET
3017	RIGHT WHITE LENTICULAR BRACKET
	% CLOSING WHITE LENTICULAR BRACKET
3018	LEFT WHITE TORTOISE SHELL BRACKET
	% OPENING WHITE TORTOISE SHELL BRACKET
3019	RIGHT WHITE TORTOISE SHELL BRACKET
	% CLOSING WHITE TORTOISE SHELL BRACKET
301A	LEFT WHITE SQUARE BRACKET
	% OPENING WHITE SQUARE BRACKET
301B	RIGHT WHITE SQUARE BRACKET
	% CLOSING WHITE SQUARE BRACKET

3021	HANGZHOU NUMERAL ONE
	% SUZHOU NUMERAL ONE
3022	HANGZHOU NUMERAL TWO
	% SUZHOU NUMERAL TWO
3023	HANGZHOU NUMERAL THREE
	% SUZHOU NUMERAL THREE
3024	HANGZHOU NUMERAL FOUR
	% SUZHOU NUMERAL FOUR
3025	HANGZHOU NUMERAL FIVE
	% SUZHOU NUMERAL FIVE
3026	HANGZHOU NUMERAL SIX
	% SUZHOU NUMERAL SIX
3027	HANGZHOU NUMERAL SEVEN
	% SUZHOU NUMERAL SEVEN
3028	HANGZHOU NUMERAL EIGHT
	% SUZHOU NUMERAL EIGHT
3029	HANGZHOU NUMERAL NINE 
	% SUZHOU NUMERAL NINE
	
309F	HIRAGANA DIGRAPH YORI
	% HIRAGANA LIGATURE YORI
30FF	KATAKANA DIGRAPH KOTO
	% KATAKANA LIGATURE KOTO

A015	YI SYLLABLE WU
13	% YI SYLLABLE ITERATION MARK 

FE18	PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET
14	% PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET

FE59	SMALL LEFT PARENTHESIS
	% SMALL OPENING PARENTHESIS
FE5A	SMALL RIGHT PARENTHESIS
	% SMALL CLOSING PARENTHESIS
FE5B	SMALL LEFT CURLY BRACKET
	% SMALL OPENING CURLY BRACKET
FE5C	SMALL RIGHT CURLY BRACKET
	% SMALL CLOSING CURLY BRACKET
FE5D	SMALL LEFT TORTOISE SHELL BRACKET
	% SMALL OPENING TORTOISE SHELL BRACKET
FE5E	SMALL RIGHT TORTOISE SHELL BRACKET
	% SMALL CLOSING TORTOISE SHELL BRACKET
	
FE6B	SMALL COMMERCIAL AT
	% SMALL AT SIGN

FEFF	ZERO WIDTH NO-BREAK SPACE
15	% BYTE ORDER MARK

FF08	FULLWIDTH LEFT PARENTHESIS
	% FULLWIDTH OPENING PARENTHESIS
FF09	FULLWIDTH RIGHT PARENTHESIS
	% FULLWIDTH CLOSING PARENTHESIS

FF20	FULLWIDTH COMMERCIAL AT
	% FULLWIDTH AT SIGN

FF3B	FULLWIDTH LEFT SQUARE BRACKET
	% FULLWIDTH OPENING SQUARE BRACKET
FF3C	FULLWIDTH REVERSE SOLIDUS
	% FULLWIDTH BACKSLASH
FF3D	FULLWIDTH RIGHT SQUARE BRACKET
	% FULLWIDTH CLOSING SQUARE BRACKET
FF5B	FULLWIDTH LEFT CURLY BRACKET
	% FULLWIDTH OPENING CURLY BRACKET
FF5D	FULLWIDTH RIGHT CURLY BRACKET
	% FULLWIDTH CLOSING CURLY BRACKET
FF5F	FULLWIDTH LEFT WHITE PARENTHESIS
	% FULLWIDTH OPENING WHITE PARENTHESIS
FF60	FULLWIDTH RIGHT WHITE PARENTHESIS
	% FULLWIDTH CLOSING WHITE PARENTHESIS
FF62	HALFWIDTH LEFT CORNER BRACKET
	% HALFWIDTH OPENING CORNER BRACKET
FF63	HALFWIDTH RIGHT CORNER BRACKET
	% HALFWIDTH CLOSING CORNER BRACKET
	
1038D	UGARITIC LETTER LAMDA
	% UGARITIC LETTER LAMBDA
	
122D4	CUNEIFORM SIGN SHIR TENU
16	% CUNEIFORM SIGN NU11 TENU
122D5	CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR
17	% CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR
	
1D0C5	BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS
18	% BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS
	
1D13A	MUSICAL SYMBOL MULTI REST
	% MUSICAL SYMBOL DOUBLE WHOLE-REST
	
1D6B2	MATHEMATICAL BOLD CAPITAL LAMDA
	% MATHEMATICAL BOLD CAPITAL LAMBDA
1D6CC	MATHEMATICAL BOLD SMALL LAMDA
	% MATHEMATICAL BOLD SMALL LAMBDA
1D6EC	MATHEMATICAL ITALIC CAPITAL LAMDA
	% MATHEMATICAL ITALIC CAPITAL LAMBDA
1D706	MATHEMATICAL ITALIC SMALL LAMDA
	% MATHEMATICAL ITALIC SMALL LAMBDA
1D726	MATHEMATICAL BOLD ITALIC CAPITAL LAMDA
	% MATHEMATICAL BOLD ITALIC CAPITAL LAMBDA
1D740	MATHEMATICAL BOLD ITALIC SMALL LAMDA
	% MATHEMATICAL BOLD ITALIC SMALL LAMBDA
1D760	MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMBDA
1D77A	MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD SMALL LAMBDA
1D79A	MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMBDA
1D7B4	MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMDA
	% MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL LAMBDA

[...?]
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾


Date/Time: Sat May 2 06:52:34 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Corrigendum

In my post of Wed Mar 25 12:54:45 CDT 2015, the tenth code point 
should read U+1E37, not U+01E7.
Sorry.

The following list shows the real instances in the NamesList where 
what I call “idle ‘@+’” occur (that is, the instances of NOTICE_LINEs 
in char entries, which are there displayed the same as COMMENT_LINEs:

U+0140
U+0149
U+01A6
U+0268
U+0269
U+0277
U+027C
U+029E
U+0307
U+1E37	(corrected)
U+1E5B
U+2301
U+234A
U+237B
U+237D
U+237E
U+237F
U+2425
U+2426
U+16F27
U+16F32
U+16F52
U+16F53

Mostly that marks up information about backwards standards 
compatibility issues. These notices should be converted to 
ordinary comments (annotations) because without any 
distinctive formatting, it is of no use that they were 
notices rather than annotations.

Fundamentally, since those issues grow less important as 
the related standards are of no more than historical interest, 
they must not impact the Code Charts’ layout neither.

Best regards,
Marcel Schneider

Date/Time: Sat May 2 06:53:47 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: U+026A

Matching:
	0197	LATIN CAPITAL LETTER I WITH STROKE
	[...]	* lowercase is 0268
		* ISO 6438 gives lowercase as 026A, not 0268
with:
	0268	LATIN SMALL LETTER I WITH STROKE
	[...]	* uppercase is 0197
	@+	* ISO 6438 gives lowercase of 0197 as 026A, not 0268

, the COMMENT_LINE for:

	026A	LATIN LETTER SMALL CAPITAL I
	[...]	* uppercase is 0197

should probably complete to:
		* ISO 6438 gives 0197 as uppercase

Best regards,
Marcel Schneider

Date/Time: Sat May 2 06:59:49 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: About casing information

I’ve already sent some feedback containing comments on 
casing information provided in the Code Charts (Mon Apr 20, 2015). 
By applying I discovered that each case is different and leads 
to various solutions.

U+00D0/U+00F0, U+0110/U+0111: The legacy practice of encoding 
the capitals Ð/Đ once only was so confusing that these letters 
have largely merited being gratified with both casing annotations 
*and* cross-references with the other case.

The same applies to U+00FF/U+0178 for the reason that this capital 
had not been encoded at all, and that even in Latin-1 (that was 
not the only default, see the œŒ). In this case the annotations 
could rather explain why these two are so far away from each other 
(a fact that was bugging people).

By contrast, U+0110 and U+0111 are so near it doesn’t make much sense 
to provide xrefs however. Comments neither, if there were no debt 
to repair. But that should be done explicitly (even if without 
excuses, in a Code Chart...).

This is why I end up preferring it this way:

00D0	LATIN CAPITAL LETTER ETH
	* lowercase is 00F0
	x (latin small letter eth - 00F0)

00F0	LATIN SMALL LETTER ETH
	* uppercase is 00D0
	x (latin capital letter eth - 00D0)


0110	LATIN CAPITAL LETTER D WITH STROKE
	* lowercase is 0111
0111	LATIN SMALL LETTER D WITH STROKE
	* uppercase is 0110


00FF	LATIN SMALL LETTER Y WITH DIAERESIS
	* uppercase is 0178 (not encoded at 00DF for compatibility with ISO/IEC 8859-1)
	x (latin capital letter y with diaeresis - 0178)

0178	LATIN CAPITAL LETTER Y WITH DIAERESIS
	* lowercase has been encoded at 00FF for compatibility with ISO/IEC 8859-1
	x (latin small letter y with diaeresis - 00FF)


Regarding U+00DF and U+1E9E (my post of Mon Apr 20, 2015), there is a need of
resolving the puzzle of a dedicated capital letter while uppercase is written
with two S. Since ẞ (uppercase) is encoded and on keyboard, things grew simple
and the annotations are not accurate any longer. I would also add some
explanations for the capital letter (additions are bracketed with
underscores):

00DF	LATIN SMALL LETTER SHARP S
	= Eszett
	* German
	* uppercase is _'0053 0053' or 1E9E_
[...]

1E9E	LATIN CAPITAL LETTER SHARP S
	_= latin capital letter sz_
	_* used to disambiguate the orthography of uppercase names_
	* lowercase is 00DF
	x (latin small letter sharp s - 00DF)


Best regards,
Marcel Schneider

Date/Time: Sat May 2 07:03:05 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: LAMDA U+039B/U+03BB

As already posted (Mon Apr 27, 2015), the spelling LAMBDA 
isn’t right neither, so giving the authentic spelling LABDA 
as an alias can help overriding the default of the ISO-originated 
Standardese. I suggest therefore adding an alias line yet for 
the capital U+309B, and echoing it for the small letter, 
by completing the existing one. To blur the issue even more, 
some other alternative spellings may be given, namely for 
the MU (MY), NU (NY), and UPSILON (YPSILON).

However, at merger, Unicode opposed some resistence against 
that messing with a Greek letter name (Unicode 1.0 was 
respectful of the canonical spelling LAMBDA, and even today 
there is U+019B LATIN SMALL LETTER LAMBDA WITH STROKE, 
= barred lambda, lambda bar, for Americanist phonetic usage), 
and it was supposedly not without hard discussions that 
Unicode finally resigned to comply. The exerted violence, 
on ISO side, presumably under the menace of secession, gives 
an idea of how important naming issues are for scaling domination. 
At the end at least, we hope, truth will overcome.

Best wishes,
Marcel Schneider
_____________________________________
039B	GREEK CAPITAL LETTER LAMDA
	= lambda, labda
03BB	GREEK SMALL LETTER LAMDA
	= lambda, labda
03BC	GREEK SMALL LETTER MU
	= my
03BD	GREEK SMALL LETTER NU
	= ny
03C5	GREEK SMALL LETTER UPSILON
	= ypsilon
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Sat May 2 07:10:09 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: PRI #297: UnicodeXData.txt

As I’ve already posted on Mon Apr 27, 2015, the information about 
formal aliases seems to be out of reach for much software users who  
are confronted with when searching for information about characters. 
It therefore seems to be consistent to make it better available. 
The same would apply to informative aliases.

Unicode clearly states in NamesList.txt, that “this file should not 
be parsed for machine-readable information”. 
By the way, all the informative aliases Unicode added for 
the information of users, implementers and developers, are lost 
because they seem to be nowhere else in the UCD.

This is why I suggest launching a new comprehensive datafile 
following the model of UnicodeData.txt, generated by adding 
many fields to UnicodeData and called therefore 
“UnicodeData Extended” or UnicodeXData.txt (X like in XHTML), 
as I already began to suggest on Thu Apr 30 06:44:57 CDT 2015. 
Among the useful fields to be added, there will surely be 
one for the formal alias and 
8 others for the informative aliases, 
one for the Indic syllable category, 
one for the bidi-mirroring glyph, 
one for the version it was encoded in and 
one for the related date, 
16 for cross-references (code point only), 
several for standardized variants, and so on.

Launching this new file gives way to an extensive communication 
aimed at IT, and will surely create a buzz, which will be able 
to convince developers about the usefulness of aliases, 
whether formal or informative. Joint to the low-threshold access 
to them, that can help true names to come on stage. 
It is important Unicode makes this effort, because the actual 
high-threshold access to data, needing complex parsing algorithms 
and depending thus on the goodwill of the involved persons), 
is likely to hide the truth to the public.


Best regards,
Marcel Schneider

Date/Time: Mon May 4 07:53:08 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: Corrigendum

I’m sorry to send you a late correction of a former 
(already belated) post of mine.

The formal alias suggestion I sent you on Thu Apr 30 06:44:57 CDT 2015 
for U+0285 LATIN SMALL LETTER SQUAT REVERSED ESH as a part of 
the list, is erroneous. The Code Chart states it must be 
LATIN SMALL LETTER LONG LEG TURNED IOTA WITH RETROFLEX HOOK. 
(UTN #27, which notes “This is actually a reversed fishhook r 
with retroflex hook.”, would thus be updated.)

Best regards,
Marcel Schneider

Date/Time: Mon May 4 07:53:59 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Equalizing locales treatment in the Code Charts

While making up some files, I got aware of some need of feedback 
that unfortunately was not already covered accurately during 
the beta review period. Excuse me please to send you this 
a whole week too late.

Some formal aliases correcting misspellings in character names 
are completed by an annotation that supposedly plays the role 
of presenting an excuse to the public for having misspelled 
a character name. This feature is found in two character entries, 
namely:

FE18	PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET
	% PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET
	* misspelling of "BRACKET" in character name is a known defect

1D0C5	BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS
	% BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS
	* misspelling of "FTHORA" in character name is a known defect

However, another character name has got the mischief of being 
misspelled, and has been added a formal alias to correct, but 
not the related annotation, while another misspelled character 
name (following UTN #27) remained without any addition:

0FD0	TIBETAN MARK BSKA- SHOG GI MGO RGYAN
	% TIBETAN MARK BKA- SHOG GI MGO RGYAN
	* used in Bhutan

0A01	GURMUKHI SIGN ADAK BINDI

(Regarding the Greek letter Lambda, already mentioned, 
the ISO spelling “Lamda” was intentional and is therefore 
not to be considered here as a misspelling.)

It seems consistent that these two characters would be given 
equally an annotation and, if not already done, a formal alias.

The same would then apply to every misnomer which is not already 
fully covered in the Standard. Proof of good quality, only two 
are remaining AFAK:

0709	SYRIAC SUBLINEAR COLON SKEWED RIGHT
	% SYRIAC SUBLINEAR COLON SKEWED LEFT
	* marks the end of a real or rhetorical question

0B83	TAMIL SIGN VISARGA
	= aytham

(Confusingly, the Tamil aytham has already an annotation 
but this says no word about the misnomer: “* just as for 
the Tamil pulli, the glyph for aytham may use either dots 
or rings”.)

Further about the Tibetan mark quoted above, there is to say 
that its counterpart is lacking a formal alias (reported 
on Thu Apr 30, 2015):

0F0A	TIBETAN MARK BKA- SHOG YIG MGO
	* petition honorific, used in Bhutan


Consequently, the discussed character entries might IMHO end up 
as listed below (additions bracketed with underscores). 
The more a character entry is tainted with defaults, 
the better it may be commented, as a kind of reparation.

Best regards,
Marcel Schneider
______________________________________________________

0F0A	TIBETAN MARK BKA- SHOG YIG MGO
	_% TIBETAN MARK ZOU YIK GUI GO_
	_=_ petition honorific
	* used in Bhutan _by an inferior addressing a superior_
	_* name (a misnomer) refers to 0FD0 ("starting flourish for giving a command")_

0FD0	TIBETAN MARK BSKA- SHOG GI MGO RGYAN
	% TIBETAN MARK BKA- SHOG GI MGO RGYAN
	* used in Bhutan _by a superior addressing an inferior_
	_* misspelling of "BKA-" in character name is a known defect_
	_x (tibetan mark bka- shog yig mgo - 0F0A)_

0A01	GURMUKHI SIGN ADAK BINDI
	_% GURMUKHI SIGN ADDAK BINDI_
	_* misspelling of "ADDAK" in character name is a known defect_
	_x (gurmukhi addak - 0A71)_

0709	SYRIAC SUBLINEAR COLON SKEWED RIGHT
	% SYRIAC SUBLINEAR COLON SKEWED LEFT
	* marks the end of a real or rhetorical question
	_* name is a misnomer_

0B83	TAMIL SIGN VISARGA
	_% TAMIL SIGN_ AYTHAM
	* just as for the Tamil pulli, the glyph for aytham may use either dots or rings
	_* character name is a misnomer_

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Mon May 4 07:54:57 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: PRI #297: A new character property for symbols-and-pictographs support

To enhance the support of symbols and pictographs, and 
in extension of my post on Tue Apr 28, 2015, 
a supplemental character property might be added. 
It would express what I call the directionality dynamics, 
and could therefore be named Directionality Dynamics Property 
(shortened to “DirDyn-Prop” or the like).

This will allow implementers and developers to optimize 
programmatically the environment and the bidi-mirroring 
of symbols. For example, the sideways shown engines U+1F680.. 
may be bidi-mirrored or not, even in left-to-right script, 
depending on whether they shall face the reader in 
reading direction (as an expression of obligingness) or 
follow reading direction (as an expression of dynamics or, 
as used to see the ancient Greeks, of victory).

As it is shown in the Code Charts, this challenge has already 
been dealt with when the font designer choose to represent 
U+1F32C WIND BLOWING FACE as “looking” and blowing from left 
to right, making the left-to-right reader feel at ease by 
not facing him in reading direction, while the 
U+1F3A0 CAROUSEL HORSE is “looking” and turning from right to left, 
thus coming on towards left-to-right readers, a gesture that 
is fully consistent with its meaning as an invitation to visit 
the ‘amusement park’ (its alias-verified symbolics).

This relationship underscores the importance of fixing 
the directionality of symbols and pictographs, in order 
to facilitate the work of font designers and implementers 
by the means of previsible and customizable settings. 
As a result, an optimization of the pictographs’ expression 
would become easily performed at layout and publishing. 
Moreover, it will be clear once again that these symbols 
and pictographs *must* become bidi-mirrorable.

Best regards,
Marcel Schneider

Date/Time: Tue May 5 05:10:30 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI #297: A definition for Formal Alias?

I've got a problem with defining the concept of a Formal Alias.
But as I hurry up to send this belated feedback too, I fear to be 
even less clear.
IMHO the definition of what is a Formal Alias is inconsistent inside UCD.
In the NamesList, it is another name to name a misnamed character, 
marked up with an percent sign, output as reference mark U+203B in 
the Code Charts. This is the way it is referred to in the NamesList syntax
page and in the Stability Policy. In NameAliases.txt however, 
"The formal name aliases are divided into five types", and the above defined
ones are just a subset, labelled "correction" or, for U+FEFF BYTE ORDER MARK, 
"alternate".
So my idea is to unify the defines, probably following the pattern already
well-known thanks to the Code Charts, where the Control character aliases 
are not considered as formal aliases, and the Byte Order Mark is considered 
as having a formal alias name because its historical name Zero width no-break
space should become out of use (however, as even in recent systems, 
the U+2060 is not present, U+FEFF must stay in use as a ZWNBSP nevertheless).

Best regards,
Marcel Schneider

Date/Time: Wed May 6 08:03:04 CDT 2015
=Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: PRI #297: feedback on XML files


Dear Unicode Editorial Committee,

I’m sorry to have avoided opening the XML files (because I prefer 
text files and the Code Charts in PDF), and I wish to thank 
Unicode and related persons for having me made aware of. 
This allows me to update some previous feedback 
Mr Freytag has kindly reviewed, by trying to feed in 
a suggestion that would surely (I believe) enhance even more 
the access to data.

The informative aliases, which I mentioned 
on Sat May 2 07:10:09 CDT 2015, are unfortunately not comprised 
in the XML files of the UCD, but fortunately the 
<name-alias/> tag allows to add as many aliases as wished, 
for which I suggest to create an “information” type label. 
This powerful markup language allows even to add the annotations 
(COMMENT_LINEs) as they are provided to the readers of 
the Code Charts, when a <comment/> tag will be created. 
Even more, the comments may be given a type each one, 
like “languages”, “casing”, “compatibility” and so on, 
allowing to browse and display them by interests and with colors. 
The block headers, actually seeming not to be a part of 
the XML files neither, may be made available with appropriate tags 
by the way.

Further, there might be a way to underscore the importance of 
the formal aliases as denominations representing the greatest 
common denominator of the various community preferences by 
highlighting the most suitable name, as Mr Freytag projected 
to do for the Charts. This swing towards an immediate readability 
even of formerly misnamed entries should be mirrored in XML 
by moving the formal alias from a value of the ‘alias’ attribute 
of the <name-alias> tag towards a value of a future ‘fa’ attribute 
of the <char> tag, where we find already the NAME (‘na’) attribute 
and the 1.0 NAME (‘na1’).

Adding much more formal aliases, at least up to 337 
(please see below for another suggestion) may then be managed 
thanks to a new ‘correction-level’ attribute, which can take 
for example the value ‘mis’ when the deal is to correct 
a misspelling or another inadvertance (as are today’s 
formal aliases), ‘std’ when eliminating Standardese 
(CARON, LAMDA, VULGAR etc.), ‘bim’ for more respectfulness 
towards bidi-mirroring (the ‘open interval’ notation using 
*reversed* square brackets does not invalidate the principle), 
and so on. I still believe the discordance about *English* names 
may be resolved by opting for the most widely used locale, 
that is American English, because the nouns “slash” and “period” 
are used in all English speaking countries (but *not*, I agree, 
the spellings “labor” and “honor”, while “color” is widely used 
even in England because of its importance in style sheets and so on).


Another problem is about file format. 
Personally I would like a database in text format to paste into 
a spreadsheet, as can be the NamesList and UnicodeData (please 
refer to my e-mail to the UTC, which would better have been 
addressed to the Editorial Committee. I took notice of what 
has been said on the Mail List, notably about UIs, and wish 
to thank all persons who were so kind and discussed my e-mails. 
However, as much information is still missing in UIs, my proposal 
of what I already christened “UnicodeXData.txt” probably remains 
interesting because even corrected, the XML files will, I guess, 
not open in a spreadsheet like a plain text file does. 
So if it is permitted to post this wish, I would like to find 
*all* information that is code-point related, in a 
UnicodeData-shaped file for ready access.

More precisely, my suggestion is to add a field that will contain 
a complete names list representing the smallest common denominator 
of all user communities, to cater reasonably for the worldwide 
demand for a complete repertoire of *one* helpful *English* name 
per character. This field will therefore contain *all* useful 
formal aliases, whether they appear in the Code Charts, or not 
(that is, for the sake of graphics, layout and design issues). 
Further, it will contain the most commonly used alias for 
each control character. To finish, it will be completed with 
the identifiers as defined today, which are the default value 
of the field. That will allow getting readily an accurate name 
for *each* character.

For more usefulness, the next field should contain a type label 
like the ones defined in NameAliases.txt, or more consistently 
(please refer to my post of Tue May 5 05:10:30 CDT 2015):
N = NAME (default value): the normal character name, used also as 
	a technical identifier;
CT = CONTROL: the first listed designation (see NameAliases.txt) 
	of a control character;
FA = FORMAL_ALIAS: the designation of a (non-control) character 
	that is not identical with the identifier;
AFA = ADDITIONAL_FORMAL_ALIAS: eventually a kind of 
	“additional” formal alias name, which do not appear 
	in the Code Charts but are present in 
	the above mentioned field.

The next field will contain the abbreviation of 
the control character alias name, otherwise it will be empty. 
And as there are up to three names per control character (U+000A), 
the next four fields may contain these supplemental names and 
their abbreviations. 
(The three unnamed controls U+0080, U+0081, U+0099 may have 
their “figment” in the same field as the third name of U+000A.)

The complete range would look as follows, admitting that 
this will be the first thing to be added to UnicodeData.txt:
Field#	Content
15	Designation: Name|Alias|FormalAlias
16	Type: N|CT|FA|AFA
17	Abbreviation
18	Second Alias
19	Second Abbreviation
20	Third Alias
21	Third Abbreviation

Other fields may contain the informative aliases provided in 
the NamesList/CodeCharts and in the XML files, 
as now suggested above.


To complete my formal alias suggestions list sent 
on Thu Apr 30 06:44:57 CDT 2015, I’m pleased to follow 
Mr Wordingham’s advice of defining Formal Aliases for 
the Devanagari Dandas too, and I opted for the addition 
of “Punctuation”, which is already present with “DANDA” 
in two languages out of twelve, to enhance the relative 
universality of these two punctuations and as a mark 
of respect to compensate the trouble made till now 
to “Bengali/Tamil etc.” users:

0964	DEVANAGARI DANDA
	% INDIAN PUNCTUATION DANDA
0965	DEVANAGARI DOUBLE DANDA
	% INDIAN PUNCTUATION DOUBLE DANDA


Best regards,
Marcel Schneider

Date/Time: Thu May 7 07:46:59 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: U+0964/5, U+00DF/1E9E; UCDXML; TUS

Dear Unicode Editorial Committee,

Sorry, my suggestion about formal aliases for Danda / double Danda 
should read of course...:
0964	DEVANAGARI DANDA
	% INDIC PUNCTUATION DANDA
0965	DEVANAGARI DOUBLE DANDA
	% INDIC PUNCTUATION DOUBLE DANDA

There is further a point I got unfortunately not sooner aware of. 
It’s about uppercasing of the German ß. Looking at the properties 
of U+00DF in ucdxml.nounihan.flat.xml, I found that uc="0053 0053"
only. In the meantime, German usage begins to shift towards 1E9E, 
as I already reported and suggested updating the NamesList and 
Code Charts annotation for this character. IMO there should be 
an applications Settings checkbox: “☑ ẞ as uppercase for ß”.  
I don’t know if it’s already implemented. However, since U+1E9E 
is now a part of most current fonts and is on keyboard thanks 
to the new German standard layouts, defining uppercase as uc="1E9E" 
might seem appropriate to avoid loosing the ß in text files. 
If the custom setting requires uppercasing U+00DF to double U+0053, 
the cf="0073 0073" value can be used to perform that.

To understand the issue, it is necessary to remember that 
the uppercase latin letter SZ has been created and encoded 
on behalf of the German Standards body DIN to ensure that 
personal data are correctly stored and rendered. As in German, 
the ß is a distinctive part of orthography and is needed in names 
(if a person’s name is Straßer or STRAẞER, writing STRASSER or 
STRASZER is false because these are other names, equally borne),
not having an uppercase ß made much trouble and lead to some 
confusion. Today, fortunately this time is past, and 
the char props may be updated. All what is needed is already 
in the UCD except the new uppercase as a value of the uc property 
for U+00DF.

Therefore I suggest that Unicode takes advice from 
the German Standards body (DIN) whether to set 
this property to its new value. 


Now as unfortunately I’ve yet another feedback to send, I would 
suggest to complete the XML files with the xrefs too. While unlike 
suggested yesterday, the casing *annotations* may be avoided in XML 
(but *not* the one for U+00DF, nor the other ones like about 
languages, ancient Standards, font design issues, typographic 
preferences and much more), the XML format offers the opportunity 
of enhancing the cross-reference support. A new tag  might 
be given two properties: char="" whose value is the code point, 
and rel="" which is new and gives way to explicite the relationship 
between the character and the xref. This information is a real need, 
and its lack is actually very annoying. I can suggest already a few 
values as: 
"case" for casing-related xrefs;
"resemble" for characters resembling by their appearance;
"origin" when the cited character was really used to create the commented one;
"ornamental" if the matter is to suggest some nice alternate chars;
"usage" for characters with similar or opposite usage;
"family" to refer to other related characters (for example, for consistency of font design)
There will be surely much more.


I believe that more, easy-to-search and readily available information 
can notably enhance user experience, whether directly or by the 
means of better and richer implementations and user interfaces.


In that sense, I suggest also to create, if feasable, an online 
vesion of the Standard (I will say, the chapters of TUS which are 
actually available in PDF). Among the advantages on user side, 
one might quote the following: 
— better browsing and searchability of the whole text 
— better referencing when “Purple Numbers” are implemented
— easy to show-and-hide fine-levelled numbering
— interactive table of contents with quick access to items
— easy-to-quote text by simplier copy’n’paste  
— formatting settings to enhance accessibility
But that does not mean I would prefer an online version to the PDF.

To improve quotability, I would suggest to typeset the character 
names (which actually are in small caps) in uppercase throughout,
and to apply rather a reduced font size like specified in the style 
sheet of UAX #9 (where, however, redundant formatting leads to lowercase
and small-cap the uppercase source text at the same time (“span.name { 
text-transform: lowercase; font-variant: small-caps; font-size: 75%; }”).
The result was not convincing as it appeared in UAX #9, section 3.2.

I’m still believing that there is a way to get a Standard for 
everyone’s use, the only condition being to read English. This allows
everybody to access, as I faultily spelled, the full bandwidth 
of the Unicode Standard in real time. “Real time” being in my sense 
the fact that no translation is needed in English, because 
the original version is fully understandable even for people 
who have just learned some English.


Best regards,
Marcel Schneider

Date/Time: Mon May 11 07:33:52 CDT 2015
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #297: Various corrigenda to my feedback

Dear Unicode Editorial Committee,

as I learned you will discuss some details after the UTC meeting, 
I hope this feedback which I shall send you as my last one 
(after several others meant to be the last), will reach you in time, 
because I found a few points to correct.


*** GRANTHA OM

The GRANTHA OM, U+11350, is named consistently with 
the DEVANAGARI OM U+0950, the GUJARATI OM U+0AD0, 
the TAMIL OM U+0BD0 and several other OM signs. 
Thus, my related feedback is out of purpose.


*** Character names in accordance with Bidi-mirroring support

The OPENING and CLOSING epithets for brackets are verified as 
accurate in all discussed contexts. A problem has been raised 
about the mathematical notation of an open interval: ]a, b[. 

I failed when Wed May 6, I supposed these were “reversed” brackets. 
IMAO the problem is resolved when considering the first bracket 
as closing the precedent interval, which extends 
from minus infinite to a, including a, and the second bracket, 
as opening the following interval, which completes 
from b (included) to +∞. 

This interpretation is highlighted when considering the notation 
of the two possible half-open intervals [a, b[ and ]a, b]: 
Every time, the bracket is CLOSING when closing an interval 
in numbering/reading direction, and OPENING when closing it 
in the other direction — a fact that BTW should lead to call 
[a, b[ “half-open upwards” instead of “half-open to the right”, 
when applying once again the *universalization strategy* that 
should be an _unremovable_ part of the framework when standardizing 
a Universal Character Set. As far as it applies to character names, 
this universalization strategy, after having been introduced 
by Unicode, has been removed by ISO.


*** Full stop

If PERIOD is not a suitable name for U+002E FULL STOP, 
one might consider giving it the frequently used DOT or POINT alias. 
However, in TUS, it is mostly referred to as a “period”, and 
this could IMHO be reason enough to prefer it, since 
unifying the way an object is called, makes the discourse 
better understandable. And that should be among the main goals 
of the Unicode Standard, rather than increasing the need of 
time-and-money-intensive translations, which probably brings 
a risk for many languages (except French) of remaining undone.


*** NamesList syntax and Code Charts layout

I’m sorry to have suggested (on Mon Apr 27 01:13:55 CDT 2015) 
to put the NamesList syntax upwards down by raising the “%” markup 
to the Names line when the char entry includes a Formal Alias, 
and I prefer recalling the other suggestion I sent 
on Mon Apr 20 03:09:02 CDT 2015: 
“Even simplier, the roles of CharacterName and FormalAlias 
may be inverted at these instances, giving the Formal Alias 
a Code Name status (and the Character Name a True Designation status).” 
I’m glad Mr Freytag agrees on the principle. 

Here like often I’ve not been clear: 
In the Unicode Charts, the name following the code point 
should always be a helpful one, an alias if necessary, 
and in that case, the old identifier can be given as is today 
the Formal Alias, with an annotation like 
“this alias is the (misspelled / mistaken / semantically obsolete / 
standardese, but) stable identifier”. 
That would replace part of the actual 
“character name is a misnomer”, 
“misspelling of [...] is a known defect” and 
“despite its name [...]” annotations.

For whole series of misnamed or ugly named characters as there are:
3021	HANGZHOU NUMERAL ONE  sqq,
00BC	VULGAR FRACTION ONE QUARTER  sqq,
2150	VULGAR FRACTION ONE SEVENTH  sqq,
010C	LATIN CAPITAL LETTER C WITH CARON  sqq,
1D6B2	MATHEMATICAL BOLD CAPITAL LAMDA  sqq,
it could be sufficient to indicate the first former name 
(the identifier) of the series and explain in an annotation 
that the others must be extrapolated conformingly, 
in order to avoid overloading the listings in the Code Charts.


*** Font-size of figures in the Code Charts
(new subject)

By contrast with the figures of the code points in the Code point column, 
the figures of the code points in the next column are inconsistent in font size. 
Decimal digits are slightly smaller than hex letters. 
In most zoom factors this results in a difference of one or two pixels, 
but this may be notable even at 100 %.

This is likely to need a comment about the dislike of figures 
in today’s documents. Many things, including human language, 
are converted to figures (character encoding is one example).  
Meanwhile, a kind of what might be called a figure allergy 
seems to have set in, which leads to hide figures and digits 
wherever feasible (and is a part of the following concern).


*** Code-points in the Bookmarks side-pane

At the opposite of what I suggested on Wed Apr 1 09:43:56 CDT 2015 
about adding code points to the PDF bookmarks for display 
in the side pane, I understand today that would overload 
this reduced space and make the bookmarks less attractive. 
When language names and figures are put together, the result 
could lead to associations with a ranking, and raise idle questions 
like “Why has this language been encoded before that other one”.

There is however a need of browsing the Code Charts by code points, 
because code point searching tools are not always suitable. 
I suggest therefore adding a bookmark called “Code Point Ranges” 
at the end, which would not automatically expand. 
Once this bookmark expanded, there would be the list of blocks 
in another format with just range start and range end, 
matching exactly the blockhead list above as about the targets, 
but displayed apart, avoiding thus the problems related to figures. 
To complete, the blockheads should then probably be grouped together 
in a general bookmark like “Blocks”, which shall expand by default 
to produce the actual display. 

The advantage would be that rather than looking up the block by range 
in Blocks.txt and then searching the side pane for the blockhead, 
the Code Charts reader could search for the range in the side pane 
and get the Chart displayed by clicking the range’s bookmark.

This enhancement would need that the Portable Document format 
allows multiple bookmark sets. In other words, a given target 
could have several bookmarks in different areas of the bookmarks list.
Another requirement is that the bookmarks can be flagged to expand or 
not to expand by default. The expand-by-default behavior would be nice 
in TUS too, to prevent the side-pane from displaying only the chapter 
head bookmark even when the last settings were saved and the setting 
was to expand the current bookmark and/or to display the main bookmarks. 
This would be even more useful in an all-in-one PDF of the Standard 
(which I didn’t find yet) because the chapters’ list would display by 
default from opening on as a kind of “obligingness” towards the reader.

IMHO it is very helpful to look up characters in the Code Charts 
and to browse the Charts in PDF, especially the very ergonomical 
all-in-one. To compare glyphs in different blocks, several copies 
may be opened in as many instances of Adobe Reader.


Best regards,
Marcel Schneider

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Date/Time: Mon May 11 10:15:06 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestions for 8.0.0: U+0964/5, U+00DF/1E9E, XML, TUS

Dear Unicode Editorial Committee,

Unfortunately the beta period is over, but not my feedback. 
I'm trying therefore to send this and another post with other 
subjects, for the case that notice can still be taken.

Sorry, my suggestion about formal aliases for Danda / double Danda 
should read of course...:
0964	DEVANAGARI DANDA
	% INDIC PUNCTUATION DANDA
0965	DEVANAGARI DOUBLE DANDA
	% INDIC PUNCTUATION DOUBLE DANDA

There is further a point I got unfortunately not sooner aware of. 
It’s about uppercasing of the German ß. Looking at the properties 
of U+00DF in ucdxml.nounihan.flat.xml, I found that uc="0053 0053"
only. In the meantime, German usage begins to shift towards 1E9E, 
as I already reported and suggested updating the NamesList and 
Code Charts annotation for this character. IMO there should be 
an applications Settings checkbox: “☑ ẞ as uppercase of ß”.  
I don’t know if it’s already implemented. However, since U+1E9E 
is now a part of most current fonts and is on keyboard thanks 
to the new German standard layouts, defining uppercase as uc="1E9E" 
might seem appropriate to avoid loosing the ß in text files. 
If the custom setting requires uppercasing U+00DF to double U+0053, 
the cf="0073 0073" value can be used to perform that.

To understand the issue, it is necessary to remember that 
the uppercase latin letter SZ has been created and encoded 
on behalf of the German Standards body DIN to ensure that 
personal data are correctly rendered. As in German, the ß 
is a distinctive part of orthography and is needed in names 
(if a person’s name is Straßer or STRAẞER, writing STRASSER or 
STRASZER is false because these are other names, equally borne),
not having an uppercase ß made much trouble and lead to some 
confusion. Today, fortunately this time is past, and the char 
the char props may be updated. All what is needed is already 
in the UCD except the new uppercase as a value of the uc property 
for U+00DF.

Therefore I suggest that Unicode takes advice from 
the German Standards body (DIN) whether to set 
this property to its new value. 


Now as unfortunately I’ve yet another feedback to send, I would 
suggest to complete the XML files with the xrefs too. While unlike 
suggested yesterday, the casing *annotations* may be avoided in XML 
(but *not* the one for U+00DF, nor the other ones like about 
languages, old Standards, font design issues, typographic 
preferences and much more), the XML format offers the opportunity 
of enhancing the cross-reference support. A new tag <xref/> might 
be given two properties: char="" whose value is the code point, 
and rel="" which is new and gives way to explicite the relationship 
between the character and the xref. This information is a real need, 
and its lack is actually very annoying. I can suggest already a few 
values as: 
"case" for casing-related xrefs;
"resemble" for characters resembling by their appearance;
"origin" when the cited character was used to create the commented one;
"ornamental" if the matter is to suggest some nice alternate chars;
"usage" for characters with similar or opposite usage;
"family" to refer to other related characters (for example, for consistency of font design)

I believe that more, easy-to-search and readily available information 
can notably enhance user experience, whether directly or by the 
means of better and richer implementations and user interfaces.

In that sense, I suggest also to create, if feasable, an online 
vesion of the Standard (I will say, the chapters of TUS which are 
actually available in PDF). Among the advantages on user side, 
one might quote the following: 
— better browsing and searchability of the whole text 
— better referencing when “Purple Numbers” are implemented
— easy to show-and-hide fine-levelled numbering
— interactive table of contents with quick access to items
— easy-to-quote and paste text 
— formatting settings to enhance accessibility
But that does not mean I would prefer an online version to the PDF.

To improve quotability, I would suggest to typeset the character 
names (which actually are in small caps) in uppercase throughout,
and to apply rather a reduced font size like specified in the style 
sheet of UAX #9 (where, however, redundant formatting leads to lowercase
and small-cap the uppercase source text at the same time (“span.name { 
text-transform: lowercase; font-variant: small-caps; font-size: 75%; }”).
The result was not convincing as it appeared in UAX #9, section 3.2.

I’m still believing that there is a way to get a Standard for 
everyone’s use, the only condition being to read English. This allows
everybody to access (as I misspelled on the Mail List) the full bandwidth 
of the Unicode information in real time. Real time is here in my sense 
the fact that the original documentation is directly usable for almost 
everybody.

Best regards,
Marcel Schneider

Date/Time: Mon May 11 10:17:07 CDT 2015
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestions for 8.0.0: PDF, and some corrigenda


Dear Unicode Editorial Committee,

as I learned you will discuss some details after the UTC meeting, 
I hope this feedback which I shall send you as my last one 
(after several others meant to be the last), will reach you in time, 
because I found a few points to correct.


*** GRANTHA OM

The GRANTHA OM, U+11350, is named consistently with 
the DEVANAGARI OM U+0950, the GUJARATI OM U+0AD0, 
the TAMIL OM U+0BD0 and several other OM signs. 
Thus, my related feedback is out of purpose.


*** Character names in accordance with Bidi-mirroring support

The OPENING and CLOSING epithets for brackets are verified as 
accurate in all discussed contexts. A problem has been raised 
about the mathematical notation of an open interval: ]a, b[. 

I failed when Wed May 6, I supposed these were “reversed” brackets. 
IMAO the problem is resolved when considering the first bracket 
as closing the precedent interval, which extends 
from minus infinite to a, including a, and the second bracket, 
as opening the following interval, which completes 
from b (included) to +∞. 

This interpretation is highlighted when considering the notation 
of the two possible half-open intervals [a, b[ and ]a, b]: 
Every time, the bracket is CLOSING when closing an interval 
in numbering/reading direction, and OPENING when closing it 
in the other direction — a fact that BTW should lead to call 
[a, b[ “half-open upwards” instead of “half-open to the right”, 
when applying once again the *universalization strategy* that 
should be an _unremovable_ part of the framework when standardizing 
a Universal Character Set. As far as it applies to character names, 
this universalization strategy, after having been introduced 
by Unicode, has been removed by ISO.


*** Full stop

If PERIOD is not a suitable name for U+002E FULL STOP, 
one might consider giving it the frequently used DOT or POINT alias. 
However, in TUS, it is mostly referred to as a “period”, and 
this could IMHO be reason enough to prefer it, since 
unifying the way an object is called, makes the discourse 
better understandable. And that should be among the main goals 
of the Unicode Standard, rather than increasing the need of 
time-and-money-intensive translations, which probably brings 
a risk for many languages (except French) of remaining undone.


*** NamesList syntax and Code Charts layout

I’m sorry to have suggested (on Mon Apr 27 01:13:55 CDT 2015) 
to put the NamesList syntax upwards down by raising the “%” markup 
to the Names line when the char entry includes a Formal Alias, 
and I prefer recalling the other suggestion I sent 
on Mon Apr 20 03:09:02 CDT 2015: 
“Even simplier, the roles of CharacterName and FormalAlias 
may be inverted at these instances, giving the Formal Alias 
a Code Name status (and the Character Name a True Designation status).” 
I’m glad Mr Freytag agrees on the principle. 

Here like often I’ve not been clear: 
In the Unicode Charts, the name following the code point 
should always be a helpful one, an alias if necessary, 
and in that case, the old identifier can be given as is today 
the Formal Alias, with an annotation like 
“this alias is the (misspelled / mistaken / semantically obsolete / 
standardese, but) stable identifier”. 
That would replace part of the actual 
“character name is a misnomer”, 
“misspelling of [...] is a known defect” and 
“despite its name [...]” annotations.

For whole series of misnamed or ugly named characters as there are:
3021	HANGZHOU NUMERAL ONE  sqq,
00BC	VULGAR FRACTION ONE QUARTER  sqq,
2150	VULGAR FRACTION ONE SEVENTH  sqq,
010C	LATIN CAPITAL LETTER C WITH CARON  sqq,
1D6B2	MATHEMATICAL BOLD CAPITAL LAMDA  sqq,
it could be sufficient to indicate the first former name 
(the identifier) of the series and explain in an annotation 
that the others must be extrapolated conformingly, 
in order to avoid overloading the listings in the Code Charts.


*** Font-size of figures in the Code Charts
(new subject)

By contrast with the figures of the code points in the Code point column, 
the figures of the code points in the next column are inconsistent in font size. 
Decimal digits are slightly smaller than hex letters. 
In most zoom factors this results in a difference of one or two pixels, 
but this may be notable even at 100 %.

This is likely to need a comment about the dislike of figures 
in today’s documents. Many things, including human language, 
are converted to figures (character encoding is one example).  
Meanwhile, a kind of what might be called a figure allergy 
seems to have set in, which leads to hide figures and digits 
wherever feasible (and is a part of the following concern).


*** Code-points in the Bookmarks side-pane

At the opposite of what I suggested on Wed Apr 1 09:43:56 CDT 2015 
about adding code points to the PDF bookmarks for display 
in the side pane, I understand today that would overload 
this reduced space and make the bookmarks less attractive. 
When language names and figures are put together, the result 
could lead to associations with a ranking, and raise idle questions 
like “Why has this language been encoded before that other one”.

There is however a need of browsing the Code Charts by code points, 
because code point searching tools are not always suitable. 
I suggest therefore adding a bookmark called “Code Point Ranges” 
at the end, which would not automatically expand. 
Once this bookmark expanded, there would be the list of blocks 
in another format with just range start and range end, 
matching exactly the blockhead list above as about the targets, 
but displayed apart, avoiding thus the problems related to figures. 
To complete, the blockheads should then probably be grouped together 
in a general bookmark like “Blocks”, which shall expand by default 
to produce the actual display. 

The advantage would be that rather than looking up the block by range 
in Blocks.txt and then searching the side pane for the blockhead, 
the Code Charts reader could search for the range in the side pane 
and get the Chart displayed by clicking the range’s bookmark.

This enhancement would need that the Portable Document format 
allows multiple bookmark sets. In other words, a given target 
could have several bookmarks in different areas of the bookmarks list.
Another requirement is that the bookmarks can be flagged to expand or 
not to expand by default. The expand-by-default behavior would be nice 
in TUS too, to prevent the side-pane from displaying only the chapter 
head bookmark even when the last settings were saved and the setting 
was to expand the current bookmark and/or to display the main bookmarks. 
This would be even more useful in an all-in-one PDF of the Standard 
(which I didn’t find yet) because the chapters’ list would display by 
default from opening on as a kind of “obligingness” towards the reader.

IMHO it is very helpful to look up characters in the Code Charts 
and to browse the Charts in PDF, especially the very ergonomical 
all-in-one. To compare glyphs in different blocks, several copies 
may be opened in as many instances of Adobe Reader.


Best regards,
Marcel Schneider

 

 


 

 

Date/Time: Mon May 4 12:36:00 CST 2015
Name: Asmus Freytag
Report Type: Error Report
Opt Subject: Review of Marcel Schneider's feedback on Unicode 8.0 beta

Feedback on "idle" @+ notices.

The nameslist is not intended for machine processing, other than by the code
chart layout tool. In that context the use of @+ is motivated and not idle.


Proposed resolution of feedback: not accepted.

---

Feedback on bookmarks

There are other tools for searching characters by code point value.


Proposed resolution of feedback: not accepted

---

Feedback on numbering levels in the standard

While fewer levels make for a more attractive book design, the difficulties in
citing material in the standard are real.


Proposed resolution of feedback: forward to ed committee

---

Feedback on showing mirrored glyphs

There are cases where the mirrored forms are not perfect mirror images (cube
root 221B, for example, where the "3" does not mirror). In such cases, the
mirrored shape might perhaps be documented as "alternate glyph". In cases of
unusual mirroring behavior an annotation like that for FD3E and FD3F should be
sufficient. One might consider adding a note on mirroring behavior at the
block level for arrows and mathematical symbols, pointing out that arrows are
not mirrored.


Proposed resolution of feedback: forward to ed committee

---

Feedback on extended datafile

The purpose of that is served by the XML version of the UCD.


Proposed resolution of feedback: not accepted

---

Feedback on character naming policy and stability (various)

There is little to be gained by abandoning the existing interpretation of
character names as identifiers or to abandon the corresponding stability
policy. The negatives on the other hand are huge.


Proposed resolution of feedback: not acceptable

---

Feedback on details of text in the standard and annotations to the nameslist (various)

These are too detailed to review in the full UTC, but some appear useful.


Proposed resolution of feedback: forward to ed committee

---

Feedback on CODENAME (mentioned in two sections)

In general, adding more syntax to the nameslist would make it unnecessarily
complex. However, it might be useful to invert the display of name aliases,
esp. where the original name is a misnomer or typo. I other words, the ed comm
might look into whether it's feasible to show the most suitable "alias" in the
formal "NAME" line of the code charts (and the original name as annotation
using the syntax for alias). For the nameslist, tracking which is which is
perhaps less important than to de-emphasize mistakes.

Rather than moving the % annotation (which would break some deeply embedded
assumptions in the code for the layout processor) the idea would be to
redefine what the contents of the nameslist are. The % annotation would then
no longer indicate "the" formal alias for a character name, but treat both
names and alias as "aliases" throughout, with the first listed one becoming
the "preferred" alias (instead of, as before, the UnicodeData "name"). This
would be most useful in cases of actual "corrections".

It's at least useful enough to have the ed committee have a look at this.

For people needing to track which alias has what status (original, vs. formal
alias) the data files give that answer.

Proposed resolution of feedback: forward to ed committee

---

Feedback on using aliases to "improve" character names (various)

This is ultimately a losing proposition. Many characters are used in multiple
ways, or have diverse names in different user communities. Attempting to
improve even those names where a consensus alternative could be found would
only result in raising the expectations on character names and make the
intractable cases stand out even more.

Even OPENING and CLOSING cannot be assigned uniquely (cf. mathematical use like]a,b[ ).

Proposed resolution of feedback: not accepted