L2/18-009

Comments on Public Review Issues
(Otober 13, 2017 - January 24, 2018)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 24, 2017, since the previous cumulative document was issued prior to UTC #153 (October 2017). Some items in the Table of Contents do not have feedback here.

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of January 24, 2018.

Issue Name Feedback Link
366 Feedback on Additional repertoire for Amendment 1 (DAM1) to ISO/IEC 10646:2017 (5th edition) (feedback)
365 Feedback on draft additional repertoire for Amendment 2.2 (PDAM) to ISO/IEC 10646:2017 (5th edition) (feedback)
364 Unicode Emoji 11.0 Beta (feedback)
363 Proposed Update UTS #37, Unicode Ideographic Variation Database (feedback)
362 Proposed Update UAX #45 U-source Ideographs (feedback)
361 Proposed Update UAX #38 Unicode Han Database (Unihan) (feedback)
360 Proposed Update UAX #11, East Asian Width (feedback) No feedback to date
359 Proposed Draft UTR #53, Unicode Arabic Mark Rendering (feedback)
358 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback to date
357 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback to date
356 Proposed Update UTS #51, Unicode Emoji (feedback)
355 Proposed Update UAX #29 Unicode Text Segmentation (feedback)

The links below go to locations in this document for feedback.

Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports

Note: The section of Feedback on Encoding Proposals this time includes:
L2/98-045  L2/02-388  L2/11-175R  L2/14-153  L2/15-004R  L2/17-077  L2/17-106R  L2/17-340  L2/17-345  L2/18-004  L2/18-010  L2/18-015  L2/18-017  L2/18-018  L2/18-019  L2/18-020  L2/18-025  L2/18-041 

 


Feedback to UTC / Encoding Proposals

Date/Time: Sat Nov 11 19:06:19 CST 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal (L2/17-345)
Opt Subject: MIAO SIGN CONSONANT MODIFIER BAR name


The name of this sign does not indicate its function clearly and the fact
that this sign is a combining character is not expressed. A better name
would be MIAO SIGN COMBINING CONTRAST OF ARTICULATION. If the comitee does
not believe that a spacing version of this character will be proposed, then
one could drop the term "combining" to get MIAO SIGN CONTRAST OF
ARTICULATION.

Date/Time: Thu Dec 14 08:17:43 CST 2017
Contact: srinidhi.pinkpetals24@gmail.com
Name: Srinidhi A
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/17-340


The document requests four objectives. While requests 2,3 and 4 are
acceptable. I feel the request 1 is not necessary.

Request  1
The chart for Common Indic Number Forms needs to add following to U+A830, U+A830,
and U+A830:
● Used in Malayalam also

Indic fractions are used in more than 15 Indic scripts. Since, they are used
in many scripts, annotating only for Malayalam may not be appropriate.

Currently, characters are named as NORTH INDIC, However are not limited to
Northen India. At the starting of Code chart below 'Number forms' it may be
mentioned as 'Fractions are also used in several scripts of South India' as
indicated in page 802 of Core specification.

Date/Time: Sun Jan 7 21:41:59 CST 2018
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal (L2/18-010
Opt Subject: On the digits for the Khwarezmian script

It is argued in the proposal L2/18-010 that separate encoding of numbers
onew two three and four is merited because other similar middle eastern
scripts uses them. However the case can be made that those scripts needed
the separate encoding, because the individual instances of the digit one are
so close togueter that one would need kerning to get the right glyph in
digital contexts. In cases like Old Sogdian, the glyphs fuse even though it
is not a joining script. That is not to say that I necessarily would be
aginst encoding the digits two three and four separetly. If it's true that
the numbers 5-9 are represented in groups of 2, 3 and 4, then it would be
convinient for implementers having them separate, because they wouldn't have
to deal with an somewhat inconsistent set of rules for introducing the
space. Then again, the author only represets such numbers using repetitions
of one and spaces.

Date/Time: Wed Jan 10 17:54:00 CST 2018
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Inconsistency in L2/18-010

L2/18-010 “Proposal to encode the Khwarezmian script in Unicode” contradicts
itself about how to encode numbers: section 4.2 uses dedicated code points
for 2 through 4 and section 5.3 uses repeated instances of the number 1.

Date/Time: Thu Jan 11 08:30:48 CST 2018
Name: Ken Lunde
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback for L2/18-004 & L2/18-017

This report is feedback for L2/18-004 and L2/18-017.

With regard to the six emphasized modern hangul syllables in the "Emphazised
Hangul syllables" section, one work-around in lieu of encoding is to
implement them via OpenType 'ccmp' GSUB feature using chaining contextual
substitutions, as demonstrated using the code below that is in AFDKO
"feature" file syntax:

lookup DPRK_SUPREME_LEADERS {
  substitute uniAE40 by uniAE40.emphasis;
  substitute uniC131 by uniC131.emphasis;
  substitute uniC740 by uniC740.emphasis;
  substitute uniC77C by uniC77C.emphasis;
  substitute uniC815 by uniC815.emphasis;
} DPRK_SUPREME_LEADERS;

feature ccmp {
  substitute uniAE40' lookup DPRK_SUPREME_LEADERS uniC77C' lookup DPRK_SUPREME_LEADERS uniC131' lookup DPRK_SUPREME_LEADERS;
  substitute uniAE40' lookup DPRK_SUPREME_LEADERS uniC815' lookup DPRK_SUPREME_LEADERS uniC77C' lookup DPRK_SUPREME_LEADERS;
  substitute uniAE40' lookup DPRK_SUPREME_LEADERS uniC815' lookup DPRK_SUPREME_LEADERS uniC740' lookup DPRK_SUPREME_LEADERS;
} ccmp;

The 'ccmp' GSUB feature is broadly implemented, is on by default, and cannot
be toggled off.

With regard to the character in the "Enclosed postal mark symbol" section,
it is present in Supplement 4 of Adobe's Adobe-Japan1-6 character collection
(aka Japanese glyph set) as CID+12180. See the "Adobe-Japan1.6.pdf" PDF file
here:

https://github.com/adobe-type-tools/Adobe-Japan1/

Its source is Morisawa's glyph sets, MOR-CODE and MOR-CODE 2. Morisawa is
Japan's leading type foundry.

Date/Time: Thu Jan 11 12:54:23 CST 2018
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on Bopomofo i variants

According to L2/18-020 “Proposal to define Standardized Variation Sequences
for BOPOMOFO LETTER I”, “The vertical bar form was used for the initial(聲母),
and the horizontal stroke form was used for the final(韻母) at the early usage
[...] Therefore, these two forms are different and should be distinguished
for the early usage or the study on the reform of the Chinese Hanzi.” That
distinction should not be encoded by a variation sequence. A variation
sequence would be appropriate for just the modern usage, but not for the
early usage where the two glyphs contrasted.

Date/Time: Fri Jan 12 21:40:11 CST 2018
Contact: nobody_uses@outlook.com
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Name of newly proposed malayalam character (L2/18-015)

From the attestations provided and the description given, the MALAYALAM END
OF TEXT MARK does not appear to be used consistently at the end of documents
like its name would imply, but rather at the end of sections (including the
final one) so a more fitting name would be MALAYALAM SECTION MARK.

Date/Time: Tue Jan 16 05:38:37 CST 2018
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/18-018 Chess Emoji

The proposal [L2/18-018] is asking for a single emoji to represent the game
of chess, but is rather unclear about exactly how this should look like
(e.g. a checkered board) and why. Most other games and sports are
represented as emojis by one or two characteristic utensils or, if that
didn't seem viable, by a human-like figure exercising this activity with the
tools required (e.g. Handball Player because a prototypical Handball is not
sufficiently distinct from a Football or Volleyball). Some sports have two
or more applicable emojis, e.g. Golf.

In particular, the comparable game of Mahjong is represented by a single
piece U+1F004 🀄 and all card games are represented by either a card back
U+1F3B4 🎴, by a representative card face U+1F0CF 🃏 or by symbols of the four
standard card suits ♣️♠️♥️♦️. The Joker Card and the Red Dragon Mahjong
Piece are parts of larger sets of characters in their respective blocks
which can be used for game notation and diagrams. Chess pieces have also
long been available in Unicode for movement notation U+2654..F ♔♕♖♗♘♙♚♛♜♝♞♟.

On Samsung devices, Unicode chess figure characters are rendered with emoji
glyphs (i.e. centered within a square), which are more appropriate for board
diagrams (in combination with emojis U+2B1B/C ⬛⬜ for empty board tiles) than
for game notation. According to
<http://cgi.wap2.jp/emoji/ezweb/?act=new_pict>, some (Sanyo/Kyocera etc.)
phones distributed by KDDI in Japan from 2008 had the 12 standard chess
pieces as emojis #577..588 (perhaps PUA U+F11F..F12B), but apparently not
directly accessible for user input. It seems as if they have not been
considered at all during the original "Emoji 4 Unicode" process. It is
unclear whether that has been a deliberate choice or mere oversight.

Given the Mahjong precedent in particular, someone at UTC might suggest to
assign the `Emoji` property to one of the twelve existing chess figures (not
counting upcoming fairy chess additions) and make this the general emoji for
Chess. I must strongly advise against this option in foresight. This would
surely disrupt the normal use of chess piece characters in some cases
because variation selectors are often inserted by input methods in a way
opaque to the user, or are ignored by output systems altogether (e.g. on
Twitter), forcing emoji display if at all possible.

If all standard chess pieces were emojified, however, users could pick one
(or a larger selection) of them to represent the game itself, much like they
can do with card suit emojis for arbitrary card games. They could also use
them to draw board diagrams almost as desired and described by Michael
Everson in [L2/17-077] but with VS-16 instead of VS-1/2 and without
explicitly encoding the color of the underlying board tile. A basic diagram
fits within a tweet on Twitter and people have been doing this with other
emojis as stand-ins. Physical and virtual chess boards also frequently
feature alternative designs for the pieces. @emojichess is a bot that
generates bogus positions, but consider [Ocean Chess], [Plant Chess] or
[Food Chess]. People use emojis to picture or even play related board games
as well, e.g. [Checkers].

I believe this complete emojification would ultimately be the best option
for chess players and other emoji users. The other pair of valid options is
either accepting or rejecting the proposal for a dedicated chess emoji
separate from other chess-related characters in Unicode.

  [Ocean Chess]: https://twitter.com/i/moments/929378210776797184 
  [Plant Chess]: https://twitter.com/IHStreet/status/929421079998758912 
  [Food Chess]: https://twitter.com/queeryeverythng/status/929941452146331649 
  [Checkers]: https://twitter.com/MissLilyRowan/status/928765637714989056 
  [L2/17-077]: http://www.unicode.org/L2/L2017/17077r-n4793r-chessboard.pdf 
  [L2/18-018]: http://www.unicode.org/L2/L2018/18018-chess-emoji.pdf 

Date/Time: Fri Jan 19 12:39:54 CST 2018
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comparison of L2/17-106R and L2/18-041

L2/18-041 “Request to Add Thai Characters to ISO/IEC 10646/Unicode” proposes
some characters for Tai Noi in the Thai block, but they should be unified
with Lao. The glyphs look more like Lao than Thai, which makes sense as Lao
is a simplification of Tai Noi, with obsolete characters removed. However,
if L2/17-106R “Revised Proposal to Encode Lao Characters for Pali” is
accepted, the Lao block will contain most of the characters needed by ISO
20674-1. Therefore, Tai Noi should be proposed as an extension to the Lao
block.

Since the Lao characters proposed in L2/17-106R are apparently not used only
for Pali and Sanskrit, their character names should not include “PALI” and
“SANSKRIT”.

Date/Time: Tue Jan 23 01:52:00 CST 2018
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/18-019 Poison Emoji

I believe this should be an emojification of U+2620 SKULL AND CROSSBONES.  
It narrows the semantics (excludes pirates, e.g.), but that is a minor matter.

Date/Time: Tue Jan 23 17:32:33 CST 2018
Name: Richard Wordingham
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/18-041 (Encoding Tai Noi in Thai Script)

L2/18-041 "Request to Add Thai Characters (WG2 N4927)" is a proposal to
encode Tai Noi (= Lao Buhan), the old form of the Lao script, as part of the
Thai script.  It should be read in conjunction with L2/18-042 to see the
complete encoding.  Most Tai Noi characters are identified with the cognate
Thai character.

0. Non-Linguistic Background Notes

The transliteration scheme is an extension of ISO 11940.  I think it is
worth noting that under ISO 11940, ฯ U+0E2F THAI CHARACTER PAIYANNOI is
transliterated differently according to whether it is serving as an
abbreviation mark (which is what the Unicode name designates) or as a minor
section break ("angkhandeaw") in contrast to the major section break ๚
U+0E5A THAI CHARACTER ANGKHANKHU.  It is therefore not essential for the
transliteration scheme that Unicode make the distinction that the
transliteration does.

1. Which Script?

If it is appropriate to encode Tai Noi as part of any existing script,
and I think it is, it is appropriate to encode it as part of the Lao
script, and revivalist/popular antiquarian enthusiasts I was aware of
were using the Lao script.  If it is appropriate to encode it as part
of the Thai script, then it is also appropriate to encode it as part of
the Lao script.  If Unicode decides to encode it as part of the Thai
script, I believe we should also encode it as part of the Lao script.

There might be good *political* reasons to encode it twice.

2. Character Identity

The transliteration scheme makes three simple character distinctions 
without justifying them.  Referring to Table 6, the distinctions are:

14 v. 69, transliterated as t̄h v. t̄h′.  The former is identified 
with Thai 'ถ' while a new character *U+0E63 THAI NOI CHARACTER THA5.  
We should ask for evidence that these are different characters.

28. v. 73 transliterated as l v. l′.  The former is identified with 
Thai 'ล', though it is the latter that is closer to modern Thai ล and 
Lao ລ.  A new character *U+0E67 THAI NOI CHARACTER LA3 is proposed for the latter.

32 v. 74 transliterated as s̄ v. s̄′.  The former is identified with 
Thai 'ส', though it is the latter that is closer to modern Thai ส and 
Lao ສ.  A new character *U+0E68 THAI NOI CHARACTER SA6 is proposed for the latter.

I do not believe we should encode the characters with '+' in the
suggested names.  They are ligatures, and would better be encoded as,
for Thai:

 <[U+0E2B THAI CHARACTER HO HIP / U+0E02 THAI CHARACTER KHO KHAI/
   U+0E04 THAI CHARACTER KHO KHWAI / U+0E16 THAI CHARACTER THO THUNG
   / U+0E2A THAI CHARACTER SO SUA ]>,
 U+200D ZERO WIDTH JOINER, 
 [U+0E19 THAI CHARACTER NO NU / U+0E21 THAI CHARACTER MO MA]

This was the conclusion of the analysis at 
https://linux.thai.net/~thep/esaan-scripts/tn-issues/tn-encoding.html .
There was some discussion of the encoding in the general list thread
starting at https://unicode.org/mail-arch/unicode-ml/y2014-m03/0202.html .

Similar conclusions apply for Lao, except that U+0EDC LAO HO NO and U+0EDD
LAO HO MO are already encoded.  However, while new characters are not
merited, we should make these ligatures named sequences.

Glyph 78, transliterated as x′y, is proposed as the basis of *U+0E6C  THAI
NOI CHARACTER O+YA.  It corresponds to Lao ຢ U+0EA2 LAO LETTER YO and
functioally to the syllable-initial Thai sequence อย.  If Tai Noi is to be
encoded in the Thai script, this character should be encoded.

Glyph 79, transliterated as xy′, is proposed as the basis of *0E6D THAI NOI
CHARACTER O+YA.  It corresponds to Lao ຽ in its obsolete rôle as both vowel
and final consonant, and is functionally equivalent to the Thai rime อย.
The transliteration standard refers to them as allographs because they have
the same Thai transliteration, but as characters they are as different as
'c' and 'g'.

We need evidence that glyph 48, the proposed *U+0E3D THAI NOI CHARACTER YA2,
transliterated as ỵ, and glyph 71, the proposed *U+0E65 THAI NOI CHARACTER
YA3, transliterated as ỵ′, are distinct characters.  The former corresponds
to one glyph of modern Lao ຽ.  Note that both glyphs 48 and 79 correspond to
the same encoded Lao character; we need evidence that the glyphs correspond
to different characters in Tai Noi.  If it is forthcoming, we may be faced
with the unpleasant desirability of disunifying the two glyphs of U+0EBD LAO
SEMIVOWEL SIGN NYO.  We also have to consider the relationship with the
sequence <U+1A60 TAI THAM SIGN SAKOT, U+1A3F TAI THAM LETTER LOW YA>.

The requested *U+0E6E THAI NOI SARA A2 appears to be a glyph variant of
U+0E30 THAI CHARACTER SARA A / U+0EB0 LAO VOWEL SIGN A.

3. Other Subscript Consonants

Proposed *U+0E69 THAI NOI CHARACTER SA7 is the sequence <U+1A60 TAI THAM
SIGN SAKOT, U+1A47 TAI THAM LETTER HIGH SSA>.  Khun Theppitak's analysis
mentioned above records many more.  How we encode them is debatable.
Perhaps we should even create a set of final consonant marks to be shared
between Thai, Lao and Tai Tham that will enable the USE to render Tai Tham
happily!  (I, for one, don't like that solution.)  A countervailing
principle is the separation of scripts.

Date/Time: Wed Jan 24 12:00:08 CST 2018
Name: Theppitak Karoonboonyanan
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/18-041 "Request to Add Thai Characters"

As Tai Noi (aka. Lao Buhan) is an old Lao script which has evolved into
contemporary Lao script, I think it is more appropriate to add the
characters to Lao block than to Thai. Doing so could save many code points
which are already encoded there, such as:

- U+0E3B THAI NOI MAI KONG ~ U+0EBB LAO VOWEL SIGN MAI KON
- U+0E3C THAI NOI CHARACTER LA2 ~ U+0EBC LAO SEMIVOWEL SIGN LO
- U+0E3D THAI NOI CHARACTER YA2 ~ U+0EBD LAO SEMIVOWEL SIGN NYO
- U+0E5C THAI NOI CHARACTER HA1+NA ~ U+0EDC LAO HO NO
- U+0E5D THAI NOI CHARACTER HA1+MA ~ U+0EDD LAO HO MO
- U+0E63 THAI NOI CHARACTER THA5 ~ U+0E96 LAO LETTER THO SUNG
- U+0E65 THAI NOI CHARACTER YA3 ~ U+0EBD LAO SEMIVOWEL SIGN NYO
- U+0E67 THAI NOI CHARACTER LA3 ~ U+0EA5 LAO LETTER LO LOOT
- U+0E68 THAI NOI CHARACTER SA6 ~ U+0EAA LAO LETTER SO SUNG
- U+0E6C THAI NOI CHARACTER O+YA ~ U+0EA2 LAO LETTER YO
- U+0E6E THAI NOI SARA A2 ~ U+0EB0 LAO VOWEL SIGN A

The "~" sign above means the former character is already encoded as the
latter, while some character pairs may have different shapes due to the
evolution.

Note that the proposed U+0E3D THAI NOI CHARACTER YA2 and U+0E65 THAI NOI
CHARACTER YA3 are considered the same character with different styles.

With this number of duplications, it should be obvious that Tai Noi should
belong in Lao block rather than in Thai.

It is worth noting that U+0E6D THAI NOI CHARACTER O+YA is analogous to
U+1A6D TAI THAM VOWEL SIGN OY. Probably, it deserves a more sensible name
like "THAI NOI VOWEL SIGN OY" or "THAI NOI SARA OY".

I have collected more issues found while reading Tai Noi manuscripts here:

https://linux.thai.net/~thep/esaan-scripts/tn-issues/tn-encoding.html

Yours sincerely,
Theppitak Karoonboonyanan.

Feedback on UTRs / UAXes

Date/Time: Tue Nov 28 13:12:38 CST 2017
Name: Solra Bizna
Report Type: Error Report
Opt Subject: UAX #9, rule X8 could be more clear

> X8. All explicit directional embeddings, overrides and isolates are
> completely terminated at the end of each paragraph. Paragraph separators are
> not included in any embedding, override or isolate, and are thus assigned
> the paragraph embedding level.

This rule specifies an important, but easy to miss, behavior. In every other
usage of "end of paragraph" in the document, straightforwardly including
paragraph separators as part of the preceding paragraph results in the
correct behavior. However, if this definition is used when applying X8, the
paragraph separator might end up being assigned the current embedding level
rather than the paragraph embedding level, or not being assigned any
embedding level at all.

The second sentence of the rule does in fact explicitly state that paragraph
separators are assigned the paragraph embedding level. However, it's phrased
in a way that makes it seem like it's clarifying consequences of existing
rules, rather than specifying a new rule. It also does not refer to
paragraph separators as `B`, which means that someone who has just read rule
X6 (which excludes `B`) and is now searching the document for a rule that
assigns an embedding level to `B` may very well not find rule X8.

Perhaps the following wording would be better:

> X8. All explicit directional embeddings, overrides and isolates are
> completely terminated upon encountering a paragraph separator (B) or the end
> of the paragraph. Since this prevents paragraph separators from being
> included in any embedding, override, or isolate, they are thus always
> assigned the paragraph embedding level.

Error Reports

Date/Time: Tue Oct 10 06:46:20 CDT 2017
Name: Srinidhi A,Sridatta A
Report Type: Error Report
Opt Subject: Indic syllabic category of Kharoshthi virama

In Indic_Syllabic_Category

*Virama is assigned to only includes characters that can act both as visible 
killer viramas and consonant stackers.

*Pure_Killer is assigned for characters that can only act as pure killers
((killing of inherent vowel in consonant sequence,with no consonant
stacking behavior)

*Invisible_Stacker is assigned for characters that can only as consonant stackers.

KHAROSHTHI VIRAMA is currently assigned property as Invisible_Stacker.

Unlike other Indic scripts Kharoshthi does not have any visible form virama 
which acts as a halanta.

When not followed by a consonant, the virama causes the preceding consonant
to be written as subscript to the left of the letter preceding it. If
followed by another consonant, the virama will trigger a combined form
consisting of two or more consonants.(from Kharoshthi section of Core
specification, page 564-565)

Since Virama in Kharoshthi can act as both halanta(killing of inherent
vowel) and form consonant conjuncts. The current property is incorrect. 
What is appropriate property?
Should it be changed to Indic_Syllabic_Category=Virama.

Date/Time: Mon Oct 30 09:45:21 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Indic categories of U+20F0 COMBINING ASTERISK ABOVE


U+20F0 COMBINING ASTERISK ABOVE has scx={Deva Gran Latn} because it is used
as a svara marker. It should also have
Indic_Syllabic_Category=Cantillation_Mark and Indic_Positional_Category=Top.

Date/Time: Sun Nov 12 16:22:17 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Underspecified Soyombo vowel signs

A Soyombo consonant may take multiple vowel signs, all of which have ccc=0
and Indic_Syllabic_Category=Vowel_Dependent. The Unicode Standard does not
specify the order. The proposal (L2/15-004R) recommends the order V_sign
M_length V_diphthong. The standard should specify this order explicitly,
instead of the vague wording in the “Vowels and Diphthongs” section.

This goes for Zanabazar Square too.

Date/Time: Mon Nov 13 13:42:08 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Typo in the section on Kayah Li

In the Kayah Li section, two of the vowels are written ⟨o’⟩ and ⟨u’⟩. 
They should be ⟨ơ⟩ and ⟨ư⟩.

Date/Time: Sun Nov 19 11:52:38 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Indic_Syllabic_Category of U+0C80

According to L2/14-153, U+0C80 KANNADA SIGN SPACING CANDRABINDU can be
followed by U+0C82 KANNADA SIGN ANUSVARA, so its Indic_Syllabic_Category
should be Bindu.

Date/Time: Sun Nov 19 12:30:42 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Script_Extensions of U+1CF2 VEDIC SIGN ARDHAVISARGA

According to L2/11-175R, U+1CF2 VEDIC SIGN ARDHAVISARGA is used in Tirhuta,
so its Script_Extensions should include Tirhuta.

Date/Time: Sun Nov 19 12:45:32 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Typo in the Meetei Mayek Extensions chart

The header for Meetei Mayek Extensions includes the word “Manupuri”, which
should be “Manipuri”.

Date/Time: Sun Nov 19 13:47:08 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Unspecified order of Meetei Mayek vowel signs

The Unicode Standard 10.0, chapter 13, page 541 says that in a Meetei Mayek
abbreviation, a consonant may have multiple vowel signs. The order of the
vowel signs is not specified. Is it pronunciation order or visual
left–top–bottom–right order?

That section also says that “[i]n such cases, the vowel matra may occur at
the end of a word”, implying that in other cases, the vowel matra may not
occur at the end of a word, which is not true.

Date/Time: Tue Nov 21 13:47:38 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Indic_Syllabic_Category of U+A8B4 SAURASHTRA CONSONANT SIGN HAARU

U+A8B4 SAURASHTRA CONSONANT SIGN HAARU is a modifier letter, kind of like a
spacing nukta. It can be followed by a vowel sign or virama. Therefore, its
Indic_Syllabic_Category should not be Consonant_Final. I suggest
Consonant_Medial, although that may not be the best choice as it is not a
consonant per se.

Date/Time: Wed Nov 22 13:50:41 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Syloti Nagri dvisvara and anusvara

The standard’s introduction to Syloti Nagri says the script has 27
consonants, 5 independent vowels, 5 dependent vowels, and 2 “proper
diacritics” (anusvara and hasanta), but nowhere does it mention U+A802
SYLOTI NAGRI SIGN DVISVARA. U+A802 is, in general, underspecified.

U+A802 is a dependent vowel that can cooccur with other dependent vowels. It
has no Indic_Syllabic_Category or Indic_Positional_Category. It seems
appropriate for it to be in the categories Vowel_Dependent and Top.

L2/02-388 says U+A802 and U+A80B SYLOTI NAGRI SIGN ANUSVARA can each appear
before (rarely) or after (usually) the vowel sign a. This is unusual for an
Indic script in Unicode. It complicates the usual model, where a top vowel
precedes a post-base vowel and a post-base vowel precedes a bindu, and any
other order is an error. The standard should therefore explicitly explain
these exceptions.

Alternatively, if it is determined that the behavior in the proposal is not
to be promoted, the standard should say that dvisvara and anusvara may each
appear in two positions relative to the vowel sign a, but that either way
the *svara is encoded after the vowel sign.

Date/Time: Tue Nov 28 07:24:38 CST 2017
Name: Songchyuan Liou
Report Type: Error Report
Opt Subject: Tai Viet(AA80–AADF)'s character names

I request for consideration to change the character names. 
These errors can be obviously discovered if you compare the 
Tai Viet letters to the related Thai and Lao letters.

("LOW" and "HIGH" are reversed.)
TAI VIET LETTER LOW KO → TAI VIET LETTER HIGH KO
TAI VIET LETTER HIGH KO → TAI VIET LETTER LOW KO
TAI VIET LETTER LOW KHO → TAI VIET LETTER HIGH KHO
TAI VIET LETTER HIGH KHO → TAI VIET LETTER LOW KHO
TAI VIET LETTER LOW KHHO → TAI VIET LETTER HIGH KHHO
TAI VIET LETTER HIGH KHHO → TAI VIET LETTER LOW KHHO
TAI VIET LETTER LOW GO → TAI VIET LETTER HIGH GO
TAI VIET LETTER HIGH GO → TAI VIET LETTER LOW GO
TAI VIET LETTER LOW NGO → TAI VIET LETTER HIGH NGO
TAI VIET LETTER HIGH NGO → TAI VIET LETTER LOW NGO
TAI VIET LETTER LOW CO → TAI VIET LETTER HIGH CO
TAI VIET LETTER HIGH CO → TAI VIET LETTER LOW CO
TAI VIET LETTER LOW CHO → TAI VIET LETTER HIGH CHO
TAI VIET LETTER HIGH CHO → TAI VIET LETTER LOW CHO
TAI VIET LETTER LOW SO → TAI VIET LETTER HIGH SO
TAI VIET LETTER HIGH SO → TAI VIET LETTER LOW SO
TAI VIET LETTER LOW NYO → TAI VIET LETTER HIGH NYO
TAI VIET LETTER HIGH NYO → TAI VIET LETTER LOW NYO
TAI VIET LETTER LOW DO → TAI VIET LETTER HIGH DO
TAI VIET LETTER HIGH DO → TAI VIET LETTER LOW DO
TAI VIET LETTER LOW TO → TAI VIET LETTER HIGH TO
TAI VIET LETTER HIGH TO → TAI VIET LETTER LOW TO
TAI VIET LETTER LOW THO → TAI VIET LETTER HIGH THO
TAI VIET LETTER HIGH THO → TAI VIET LETTER LOW THO
TAI VIET LETTER LOW NO → TAI VIET LETTER HIGH NO
TAI VIET LETTER HIGH NO → TAI VIET LETTER LOW NO
TAI VIET LETTER LOW BO → TAI VIET LETTER HIGH BO
TAI VIET LETTER HIGH BO → TAI VIET LETTER LOW BO
TAI VIET LETTER LOW PO → TAI VIET LETTER HIGH PO
TAI VIET LETTER HIGH PO → TAI VIET LETTER LOW PO
TAI VIET LETTER LOW PHO → TAI VIET LETTER HIGH PHO
TAI VIET LETTER HIGH PHO → TAI VIET LETTER LOW PHO
TAI VIET LETTER LOW FO → TAI VIET LETTER HIGH FO
TAI VIET LETTER HIGH FO → TAI VIET LETTER LOW FO
TAI VIET LETTER LOW MO → TAI VIET LETTER HIGH MO
TAI VIET LETTER HIGH MO → TAI VIET LETTER LOW MO
TAI VIET LETTER LOW YO → TAI VIET LETTER HIGH YO
TAI VIET LETTER HIGH YO → TAI VIET LETTER LOW YO
TAI VIET LETTER LOW RO → TAI VIET LETTER HIGH RO
TAI VIET LETTER HIGH RO → TAI VIET LETTER LOW RO
TAI VIET LETTER LOW LO → TAI VIET LETTER HIGH LO
TAI VIET LETTER HIGH LO → TAI VIET LETTER LOW LO
TAI VIET LETTER LOW VO → TAI VIET LETTER HIGH VO
TAI VIET LETTER HIGH VO → TAI VIET LETTER LOW VO
TAI VIET LETTER LOW HO → TAI VIET LETTER HIGH HO
TAI VIET LETTER HIGH HO → TAI VIET LETTER LOW HO
TAI VIET LETTER LOW O → TAI VIET LETTER HIGH O
TAI VIET LETTER HIGH O → TAI VIET LETTER LOW O

Date/Time: Fri Dec 1 08:14:41 CST 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Default ignorability of U+1D159 MUSICAL SYMBOL NULL NOTEHEAD

Should U+1D159 MUSICAL SYMBOL NULL NOTEHEAD be default ignorable?

This question came up last year on the mailing list. The response
(http://unicode.org/pipermail/unicode/2016-September/003953.html) was that
it shouldn’t, because “it is essentially just a base for applying the
various combining stems and flags for a display without showing a particular
notehead, analogous to applying a generic combining mark to a NBSP to show
that combining mark in isolation.” That sounds reasonable, but it is not
what the standard says.

The standard only mentions it when discussing the musical format characters:
“In some exceptional cases, beams are left unclosed on one end. This status
can be indicated with a U+1D159 MUSICAL SYMBOL NULL NOTEHEAD character if no
stem is to appear at the end of the beam.” The proposal (L2/98-045) says the
same: “In some exceptional cases, beams are left-unclosed on one end. This
can be indicated with a "null note" (0001 xx92 WESTERN MUSICAL SYMBOL NULL
NOTEHEAD) character if no stem is to appear at the end of the beam.”

If U+1D159 is only meant to be used with the control characters, which are
default ignorable, then it too should be default ignorable. If it is meant
to be a spacing invisible notehead that can take combining marks, the
standard or the code chart should say so. As it is, it is not clear whether
it should be zero-width or not.

Date/Time: Sun Dec 31 01:15:34 CST 2017
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: Mentioning the alternative name of U+06A9

U+06A9 ARABIC LETTER KEHEH currently says "Persian, Arabic, ..." in its
description. This character is also used in Sindhi and it's worth mentioning
this, especially since the name "keheh" comes from the Sindhi name.

This letter is typically called the "Kaf Mashkula"
(https://en.wikipedia.org/wiki/Kaph#Arabic_k%C4%81f), this information
should be in the description as well so that it can be found by that name.

Date/Time: Wed Jan 3 07:45:32 CST 2018
Name: Huáng Jùnliàng
Report Type: Error Report
Opt Subject: Unicode Space Characters Table 6.2 should not contains U+180E

The Table 6-2. Unicode Space Characters on page 268 of Unicode10.0.0
specification lists U+180E as one of space characters.

As is stated, “The space characters in the Unicode Standard can be
identified by their General Category, [gc=Zs]”. However, the U+180E
Mongolian Vowel Separator has changed General Category from Zs to Cf since
Unicode 6.3.0[1][2], we should respect this change and remove U+180E from
the table.

[1] http://www.unicode.org/L2/L2013/13004-vowel-sep-change.pdf 
[2] http://www.unicode.org/reports/tr44/tr44-14.html 

Date/Time: Mon Jan 15 17:18:05 CST 2018
Name: Behnam Esfahbod
Report Type: Error Report
Opt Subject: Error in TUS Table 18-1. Blocks Containing Han Ideographs, 7th Row

Unicode Blocks are defined as "a uniquely named, continuous, non-overlapping
range of code points, containing a multiple of 16 code points".
[https://www.unicode.org/glossary/#block]

"CJK Unified Ideographs Extension F" is defined as `2CEB0..2EBEF`. 
[http://ftp.unicode.org/Public/10.0.0/ucd/Blocks.txt]

In The Unicode Standard, Table 18-1. Blocks Containing Han Ideographs has
the "Range" column which appears to be showing the ranges used to define
Unicode Blocks, named in the first column, "Block".

The "Range" column has the correct value for all the Blocks listed, except
"CJK Unified Ideographs Extension F", which is shown as "2CEB0–2EBE0",
instead of "2CEB0–2EBEF" (notice the "F" instead of "0" as LSD of the
range's end codepoint).

It's true that U+2EBE0 is the last *assigned* codepoint of this Block, but
from the context, it doesn't look like that's what the column represents.
Similarly, Block "CJK Unified Ideographs Extension A" has U+4DB5 as the last
*assigned* codepoint, but the correct range end value (`4DBF`) is listed in
the table.

Suggested Correction: Update the 7th row of the table to:
```
| CJK Unified Ideographs Extension F | 2CEB0–2EBEF | Rare, historic |
```

Date/Time: Sat Jan 20 12:50:17 CST 2018
Name: Andrew West
Report Type: Error Report
Opt Subject: Underspecified Zanabazar Square vowel signs

Whilst designing and implementing a font for the Zanabazar Square script
encoded in Unicode 10.0 I have encountered the following issue (see also
report by David Corbett on 12 November 2017).

Zanabazar Square consonants may take multiple vowel signs and a vowel length
mark (U+11A0A), but all have ccc=0 so different (but visually identical)
sequences of consonant plus multiple vowel signs and length mark do not
normalize to a canonical order. The Unicode Standard does not specify a
correct order of vowel signs, and Anshuman Pandey's proposal to encode
Zanabazar Square script (http://www.unicode.org/L2/L2015/15337-zanabazar-
square.pdf) is inconsistent on the order of vowel signs and the placement of
the vowel length mark, e.g. on pages 5-6 the vowel length mark is placed
after the vowel signs I, UE, U, E, OE, O, and Reversed I, but before the
vowel signs AI and AU, and between two vowel signs in some sequences.

I find it odd to put the vowel length mark after a vowel sign (and even
odder to put it between two vowel signs) because the mark corresponds to
Tibetan a-chung (U+0F71) which is placed between consonant and vowel signs.
Furthermore the Zanabazar length mark attaches (ligates) to the preceding
consonant so I feel intuitively that it belongs after the consonant and
before any vowel signs.

From a font implementation point of view it is much easier to deal with the
length mark using OpenType substitutions if it is not separated from the
preceding consonant by one or more vowel signs. I would like the Unicode
Standard to specify the encoding order of vowel signs and vowel length mark
so that end users and implementers can have a common understanding of how to
write the Zanabazar Square script. In particular I would like  to define the
vowel length mark as coming before any vowel sign, i.e. <consonant>
[<subjoiner> <consonant>]* [<cluster final letter>] [<length mark>] [<vowel
sign>]*.

Other Reports

Date/Time: Sat Nov 4 04:59:34 CDT 2017
Name: Marlen
Report Type: Other Question, Problem, or Feedback
Opt Subject: I with a dot, I without a dot

Please consider the introduction of additional symbols associated with the
variants of "I i" with a dot and without a dot. In some Turkic languages,
the symbols "I i" are used in two variants: "I ı" (I without a dot) and "İ
i" (I with a dot).

At the abstract level, there are 3 pairs of different symbols: I standard (I
i), I with a dot (İ i), I without a dot (I ı). But in Unicode there are
additional characters only for "ı" (lowercase I without a dot) and "İ"
(uppercase I with a dot).

When using ignorecase soft, this can lead to serious problems and
misunderstandings. This problem could be solved this way.

The introduction of an additional "I" (uppercase I without a period) as the
uppercase for the symbol "ı" (lowercase I without a dot), with a different
code than the standard "I".

And the introduction of an additional "i" (lowercase I with a period) as a
lower case for the "I" symbol, which differs from the standard "i" code.

Date/Time: Fri Nov 10 17:16:01 CST 2017
Report Type: Error Report
Opt Subject: Minor error in UAX #14

Section 5 of "UAX #14: Unicode Line Breaking Algorithm" refers 
to LineBreak.txt as tab-delimited. LineBreak.txt is, in fact,
semicolon-delimited.

(I did say it was minor. :) )

Date/Time: Mon Dec 18 15:30:34 CST 2017
Name: Umihotaru Sasea
Report Type: Feedback on an Encoding Proposal
Opt Subject: MAYAN vs MAYA

The term "Mayan Numerals" should be changed to "Maya Numerals" (to harmonize
with the term "Maya Hieroglyphs" found in the roadmap table), as scholars
use MAYA rather than MAYAN for an adjective of Maya.

See:
http://www.osea-cite.org/program/maya_or_mayans.php 
https://www.thoughtco.com/ancient-maya-mayans-most-accepted-term-171569 
https://www.belize.com/maya-or-mayan 

Date/Time: Wed Dec 20 05:35:36 CST 2017
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: French typesetting and Unicode

Hello,

below is a piece of formal personal feedback, replacing my 
previous post that was an information request but would have 
been handled as general feedback for the next UTC meeting.

Thank you for bringing this to the attention of the UTC so 
that it hopefully can be settled just in time when French 
keyboard layouts must be defined for public release.

Best regards,

Marcel
_______________________________________________________________

• French typesetting not natively supported

U+202F NARROW NO-BREAK SPACE is kind of recommended to us for 
typeseting of a set of French punctuations only since 2014 
and version 7.0 of the Standard, when the section on spaces 
in chapter 6 was edited. How was French supposed to be 
typeset before?

Unicode specifies almost all space characters to be 
breakable. That spec is disruptive, as the em, two-per-em and 
four-per-em spaces were non-breakable in phototypesetting. 
Thus, the four-per-em space was to be used to surround 
punctuations on the no-break side. This practice has been 
disrupted by Unicode, as nobody can input interoperable text 
any longer. Note however that word processing software stays 
handling these spaces as non-breakable, to conform to user 
expectations inherited from pre-Unicode practice.

Further, the Unicode narrow no-break space U+202F was encoded 
for Mongolian and used in Phags-pa too, as being close in 
typesetting practice and requirements. That took place no 
sooner than in version 3.0 published in September, 1999, 
seven years after v1.1. Supposed that French implementers are 
eager to spread best practices, how were we supposed to 
typeset French text before being able to use the only 
fixed-width no-break space in Unicode, that seems to be 
U+202F? 

However, pre-Unicode typesetting seems to have been unusable 
already, as one needs a justifying no-break space as well, 
not only a bunch of fixed-width ones. Unicode specifies 
U+00A0 as being justifying. That in turn makes it unfit for 
French punctuation typesetting. Word processors work around 
by providing that space as fixed-width, while publishing 
software provides two no-break spaces of same default width, 
one justifying, the other fixed-width. Unicode seems not to 
have encoded the latter, so that the goal of empowering 
people to get interoperable text is unreached, be it by 
design, or by mistake.


• Interoperable abbreviation typesetting unsupported

As far as space characters are involved, Unicode applies the 
universality design principle by allowing re-use of U+202F 
for French, where its width is doubled to fit its new 
purpose. On the other hand, Unicode prohibits re-use of 
superscript Latin letters for interoperable French 
abbreviation typesetting, urging people not to use characters 
otherwise than intended. Other languages like English, 
Italian, Portuguese and Spanish are concerned as well, 
however not to the same extent.

Simultaneously, another Unicode design principle, stipulating 
that significant differences in appearance or behavior must 
be handled by different characters, not different fonts, is 
contradicted by endorsing that an approx. nine..twelve-per-em 
space, as is U+202F in Mongolian, is used in other scripts as 
an equivalent of a four..six-per-em space. On the other hand, 
re-use of a subset of the spacing modifier letters encoded 
partly for phonetic transcription, partly for medievist 
usage, is outlawed, despite the new purpose of robust 
abbreviation typesetting is entirely coherent with the 
originally intended use, as well in practice (use of 
superscript in abbreviations goes back to medieval Latin 
handwriting), as in glyph shapes (superscript modifier 
letters must be evenly shaped by specification, as more than 
the whole Latin base alphabet may be used in phonetics) and 
in character properties. 


Now, should the narrow no-break space encoded for Mongolian 
be used for French as well, and the superscript Latin letters 
encoded for phonetics should not? Unicode is known to be 
eager to support legacy practice and make for round-trip 
conversion between the new and the old standards. All and any 
legacy characters presented during its first years made their 
way into the Standard.

1) How could it happen that spaces like four-per-em space 
were encoded as breakable? Making them non-breakable means 
surrounding them with word-joiners, so that we get three 
characters instead of one single character. In the same 
spirit, Unicode could have encoded all diacriticized letters 
as combining sequences only.

2) Why did Unicode pick only one of the two no-break spaces 
used in desktop publishing practice, and left the fixed-width 
one to private use? As a consequence, interoperable plain 
text cannot be exported from that software, as both spaces 
are merged into one single character. That contradicts the 
Unicode design principle of allowing all basic semantics to 
be represented in plain text.

3) What made Unicode overlook that users must be enabled to 
robustly typeset not only Italian, Spanish and Portuguese 
ordinal indicators, but also English and French ones, as well 
as other abbreviations? Whenever the best plain-text 
representation is an ugly and non-conformant fallback, 
Unicode is not supporting plain text representation of that 
language. 



Things would be different today if Unicode had supported 
French from the beginning on, specifying what space to use 
for typesetting French punctuation, like it did specify 
peculiar features for many languages, and encoding what is 
needed for robust representation of the language in plain 
text, as it did encode many many special letters and format 
characters to correctly represent in plain text every single 
language it tackled to support. Several important pieces of 
information are missing for a streamlined Unicode education. 
Knowing about the whys and hows would make documentation much 
more straightforward. Making Unicode more conformant to its 
design principles would make the Standard better usable.

Thatʼs what we expect. It basically requires to correct one 
biased policy so that non-conformant fonts can be rejected, 
and to assume that the industry needed to hack the repertoire 
to get French correctly supported. There is still a lot of 
trouble telling people a quarter of a century after Unicode 
has been brought to us, that we are now coming up with the 
right space for our punctuation. French people could assume 
that it was in Unicode from the beginning on. 
In fact, it wasnʼt.



Further reading:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0119.html 
Please complete with:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m04/0278.html 

_______________________________________________________________

Date/Time: Sat Jan 20 06:22:41 CST 2018
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Angle brackets

Reviewing L2/18-025 drove my attention to angle brackets.

Iʼm unable to retrieve the rationale of the canonical equivalence of 
non-CJK angle brackets U+2329 U+232A with CJK angle brackets U+3008 U+3009.

TUS suggests that this canonical equivalence has been, which implies that it 
is no longer. However, due to stability guarantee of canonical equivalence,
non-CJK angle brackets have been subject of duplicate encoding to recover 
the use of angle brackets in non-CJK contexts (U+27E8 U+27E9).

We need to document the point in using “mathematical” angle brackets in 
ordinary text. That is not done by pointing a canonical equivalence 
without documenting that canonical equivalence itself. 

Presumably, making these characters canonically equivalent was an encoding 
error, and should be declared as such for transparency.

Moreover, TUS uses the term “angle brackets” when referring to the ASCII 
chevrons LESS-THAN and GREATER-THAN. That is confusing.

According to Wikipedia, these are mainly called “pointy brackets”:

https://en.wikipedia.org/wiki/Bracket

Date/Time: Sat Jan 20 12:24:10 CST 2018
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Typographic dashes disambiguation

[[NOTE: This feedback supersedes the previously sent with same subject.]]

Reviewing L2/18-025 drove my attention to U+2015 and U+2012.

U+2015

Properly designed fonts like Cambria give U+2015 a four-per-three-em 
width. The confusion among users is due to improperly designed fonts 
that give it the same length as the em-dash, making it useless in practice.
But that is just a flaw in fonts, fueled by the confusing names in the 
Unicode Standard suggesting that there was no means to give U+2015 a name 
like those of U+2013 and U+2014, as if the difference is only in semantics, 
not in typography. For consistency, U+2015 should have been given a name based 
on advance width, or length, like U+2013 and U+2014.

This has already been taken into account in the draft French translation, 
where on 2017-12-26 it had been renamed to TIRET TROIS QUARTS DE CADRATIN, 
that translates to FOUR-PER-THREE-EM DASH, according the English-style 
naming. Short form would be 4/3M, or more arithmetically, 3/4 M, or ¾M.

HORIZONTAL BAR is confusing also in that, it suggests a tie to U+007C 
VERTICAL BAR. The HORIZONTAL BAR label is likely to be a last-resort 
choice, while the actual length of this dash was still flawed by fonts.
Hence, Unicode added the informative alias “quotation dash.”

An annotation should be added to U+2015 preventing further confusion.

When an em-dash is too long, and a half-em dash (en dash) is too short, 
the three-quarter-em dash is right.


U+2012

FIGURE DASH seems to be part of a set taken over from legacy typesetting 
of figure tables: U+2007 FIGURE SPACE, U+2008 PUNCTUATION SPACE, and 
U+2012 FIGURE DASH. These three allow for roundtrip compatibility with 
older standards from the time when figure (function) tables were already 
computed while typesetting was still done in hot metal, and would have 
been useful when data was output for Linotype. 

Anyhow, the Unicode Standard recommends the use of U+2013 EN DASH to denote
intervals. An alternate recommendation is found on technicalauthoring.com:

http://www.technicalauthoring.com/wiki/index.php/Figure_dash

Wikipedia indicates the reason why the figure dash is preferred for intervals:

https://en.wikipedia.org/wiki/Dash#Similar_Unicode_characters

It is the same rationale as for hyphen vs minus sign: The latter is centered 
on uppercase digits, the former on lowercase letters.

That leads to reconsider the Unicode recommendation of U+2013 for intervals
in ordinary text (as opposed to technical notation using two dots).

Based on the Unicode Standard, Iʼve taken U+2012 off the keyboard layout, 
but this new evidence might lead to remap it instead of the en dash in the 
numbers level.

What is actual/new Unicode Policy as of proper representation of intervals?