[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #8512(accepted data)

Opened 2 years ago

Last modified 22 months ago

Produce tailored grapheme-cluster break

Reported by: mark Owned by: pedberg
Component: segmentation Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

The TAG characters are being dedeprecated for use in subdivision flags. That usage, though not yet approved, we need to start preparing for. I suggest that we go a bit broader in the direction that we'd like to see the UTC go in 9.0, and review the default_ignorable characters to see which should go into Extend, and which into Prepend, and which left alone.

Many are already in Extend. See http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3Adi%3A%5D-%5B%3Acn%3A%5D%0D%0A&g=gcb+gc

The ones that are not in Extend are the following (need also to look at the 8.0 characters):

General_Category=Format items: 136

Latin 1 Supplement — Latin-1 punctuation and symbols items: 1

U+00AD ( ) SOFT HYPHEN
Arabic — Format character items: 1

U+061C ( ) ARABIC LETTER MARK
Mongolian — Format controls items: 1

U+180E ( ) MONGOLIAN VOWEL SEPARATOR

General Punctuation — Format character items: 13

U+200B ( ) ZERO WIDTH SPACE
U+200E ( ‎ ) LEFT-TO-RIGHT MARK
U+200F ( ‎‏‎ ) RIGHT-TO-LEFT MARK
U+202A ( ) LEFT-TO-RIGHT EMBEDDING
U+202B ( ) RIGHT-TO-LEFT EMBEDDING
U+202C ( ) POP DIRECTIONAL FORMATTING
U+202D ( ) LEFT-TO-RIGHT OVERRIDE
U+202E ( ) RIGHT-TO-LEFT OVERRIDE
U+2060 ( ) WORD JOINER
U+2066 ( ) LEFT-TO-RIGHT ISOLATE
U+2067 ( ) RIGHT-TO-LEFT ISOLATE
U+2068 ( ) FIRST STRONG ISOLATE
U+2069 ( ) POP DIRECTIONAL ISOLATE

General Punctuation — Invisible operators items: 4

U+2061 ( ) FUNCTION APPLICATION
U+2062 ( ) INVISIBLE TIMES
U+2063 ( ) INVISIBLE SEPARATOR
U+2064 ( ) INVISIBLE PLUS

General Punctuation — Deprecated items: 6

U+206A ( ) INHIBIT SYMMETRIC SWAPPING
U+206B ( ) ACTIVATE SYMMETRIC SWAPPING
U+206C ( ) INHIBIT ARABIC FORM SHAPING
U+206D ( ) ACTIVATE ARABIC FORM SHAPING
U+206E ( ) NATIONAL DIGIT SHAPES
U+206F ( ) NOMINAL DIGIT SHAPES

Arabic Presentation Forms B — Special items: 1

U+FEFF ( ) ZERO WIDTH NO-BREAK SPACE

Shorthand Format Controls — Shorthand format controls items: 4

U+1BCA0 ( ) SHORTHAND FORMAT LETTER OVERLAP
U+1BCA1 ( ) SHORTHAND FORMAT CONTINUING OVERLAP
U+1BCA2 ( ) SHORTHAND FORMAT DOWN STEP
U+1BCA3 ( ) SHORTHAND FORMAT UP STEP

Musical Symbols — Beams and slurs items: 8

U+1D173 ( ) MUSICAL SYMBOL BEGIN BEAM
U+1D174 ( ) MUSICAL SYMBOL END BEAM
U+1D175 ( ) MUSICAL SYMBOL BEGIN TIE
U+1D176 ( ) MUSICAL SYMBOL END TIE
U+1D177 ( ) MUSICAL SYMBOL BEGIN SLUR
U+1D178 ( ) MUSICAL SYMBOL END SLUR
U+1D179 ( ) MUSICAL SYMBOL BEGIN PHRASE
U+1D17A ( ) MUSICAL SYMBOL END PHRASE

Tags — Tag identifiers items: 1

U+E0001 ( ) LANGUAGE TAG

Tags — Tag components items: 96

U+E0020 ( ) TAG SPACE
U+E0021 ( ) TAG EXCLAMATION MARK
U+E0022 ( ) TAG QUOTATION MARK
U+E0023 ( ) TAG NUMBER SIGN
U+E0024 ( ) TAG DOLLAR SIGN
U+E0025 ( ) TAG PERCENT SIGN
U+E0026 ( ) TAG AMPERSAND
U+E0027 ( ) TAG APOSTROPHE
U+E0028 ( ) TAG LEFT PARENTHESIS
U+E0029 ( ) TAG RIGHT PARENTHESIS
U+E002A ( ) TAG ASTERISK
U+E002B ( ) TAG PLUS SIGN
U+E002C ( ) TAG COMMA
U+E002D ( ) TAG HYPHEN-MINUS
U+E002E ( ) TAG FULL STOP
U+E002F ( ) TAG SOLIDUS
U+E0030 ( ) TAG DIGIT ZERO
U+E0031 ( ) TAG DIGIT ONE
U+E0032 ( ) TAG DIGIT TWO
U+E0033 ( ) TAG DIGIT THREE
U+E0034 ( ) TAG DIGIT FOUR
U+E0035 ( ) TAG DIGIT FIVE
U+E0036 ( ) TAG DIGIT SIX
U+E0037 ( ) TAG DIGIT SEVEN
U+E0038 ( ) TAG DIGIT EIGHT
U+E0039 ( ) TAG DIGIT NINE
U+E003A ( ) TAG COLON
U+E003B ( ) TAG SEMICOLON
U+E003C ( ) TAG LESS-THAN SIGN
U+E003D ( ) TAG EQUALS SIGN
U+E003E ( ) TAG GREATER-THAN SIGN
U+E003F ( ) TAG QUESTION MARK
U+E0040 ( ) TAG COMMERCIAL AT
U+E0041 ( ) TAG LATIN CAPITAL LETTER A
U+E0042 ( ) TAG LATIN CAPITAL LETTER B
U+E0043 ( ) TAG LATIN CAPITAL LETTER C
U+E0044 ( ) TAG LATIN CAPITAL LETTER D
U+E0045 ( ) TAG LATIN CAPITAL LETTER E
U+E0046 ( ) TAG LATIN CAPITAL LETTER F
U+E0047 ( ) TAG LATIN CAPITAL LETTER G
U+E0048 ( ) TAG LATIN CAPITAL LETTER H
U+E0049 ( ) TAG LATIN CAPITAL LETTER I
U+E004A ( ) TAG LATIN CAPITAL LETTER J
U+E004B ( ) TAG LATIN CAPITAL LETTER K
U+E004C ( ) TAG LATIN CAPITAL LETTER L
U+E004D ( ) TAG LATIN CAPITAL LETTER M
U+E004E ( ) TAG LATIN CAPITAL LETTER N
U+E004F ( ) TAG LATIN CAPITAL LETTER O
U+E0050 ( ) TAG LATIN CAPITAL LETTER P
U+E0051 ( ) TAG LATIN CAPITAL LETTER Q
U+E0052 ( ) TAG LATIN CAPITAL LETTER R
U+E0053 ( ) TAG LATIN CAPITAL LETTER S
U+E0054 ( ) TAG LATIN CAPITAL LETTER T
U+E0055 ( ) TAG LATIN CAPITAL LETTER U
U+E0056 ( ) TAG LATIN CAPITAL LETTER V
U+E0057 ( ) TAG LATIN CAPITAL LETTER W
U+E0058 ( ) TAG LATIN CAPITAL LETTER X
U+E0059 ( ) TAG LATIN CAPITAL LETTER Y
U+E005A ( ) TAG LATIN CAPITAL LETTER Z
U+E005B ( ) TAG LEFT SQUARE BRACKET
U+E005C ( ) TAG REVERSE SOLIDUS
U+E005D ( ) TAG RIGHT SQUARE BRACKET
U+E005E ( ) TAG CIRCUMFLEX ACCENT
U+E005F ( ) TAG LOW LINE
U+E0060 ( ) TAG GRAVE ACCENT
U+E0061 ( ) TAG LATIN SMALL LETTER A
U+E0062 ( ) TAG LATIN SMALL LETTER B
U+E0063 ( ) TAG LATIN SMALL LETTER C
U+E0064 ( ) TAG LATIN SMALL LETTER D
U+E0065 ( ) TAG LATIN SMALL LETTER E
U+E0066 ( ) TAG LATIN SMALL LETTER F
U+E0067 ( ) TAG LATIN SMALL LETTER G
U+E0068 ( ) TAG LATIN SMALL LETTER H
U+E0069 ( ) TAG LATIN SMALL LETTER I
U+E006A ( ) TAG LATIN SMALL LETTER J
U+E006B ( ) TAG LATIN SMALL LETTER K
U+E006C ( ) TAG LATIN SMALL LETTER L
U+E006D ( ) TAG LATIN SMALL LETTER M
U+E006E ( ) TAG LATIN SMALL LETTER N
U+E006F ( ) TAG LATIN SMALL LETTER O
U+E0070 ( ) TAG LATIN SMALL LETTER P
U+E0071 ( ) TAG LATIN SMALL LETTER Q
U+E0072 ( ) TAG LATIN SMALL LETTER R
U+E0073 ( ) TAG LATIN SMALL LETTER S
U+E0074 ( ) TAG LATIN SMALL LETTER T
U+E0075 ( ) TAG LATIN SMALL LETTER U
U+E0076 ( ) TAG LATIN SMALL LETTER V
U+E0077 ( ) TAG LATIN SMALL LETTER W
U+E0078 ( ) TAG LATIN SMALL LETTER X
U+E0079 ( ) TAG LATIN SMALL LETTER Y
U+E007A ( ) TAG LATIN SMALL LETTER Z
U+E007B ( ) TAG LEFT CURLY BRACKET
U+E007C ( ) TAG VERTICAL LINE
U+E007D ( ) TAG RIGHT CURLY BRACKET
U+E007E ( ) TAG TILDE
U+E007F ( ) CANCEL TAG

General_Category=Other_Letter items: 2

Hangul Compatibility Jamo — Special character items: 1

U+3164 ( ) HANGUL FILLER

Halfwidth And Fullwidth Forms — Halfwidth Hangul variants items: 1

U+FFA0 ( ) HALFWIDTH HANGUL FILLER

Attachments

Change History

comment:1 Changed 2 years ago by mark

  • Summary changed from Produced tailored grapheme-cluster break to Produce tailored grapheme-cluster break

comment:2 Changed 2 years ago by mark

  • Cc pedberg added

comment:4 Changed 2 years ago by roozbeh

From the set Mark is providing in the original report, these should be safe to add to Extend:

  • U+1D174 ( ) MUSICAL SYMBOL END BEAM
  • U+1D176 ( ) MUSICAL SYMBOL END TIE
  • U+1D178 ( ) MUSICAL SYMBOL END SLUR
  • U+1D17A ( ) MUSICAL SYMBOL END PHRASE
  • The tag characters we plan to use for flags.

The rest would need more careful consideration.

comment:5 Changed 2 years ago by emmons

  • Status changed from new to accepted
  • Component changed from unknown to segmentation
  • Priority changed from assess to medium
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 28
  • Owner changed from anybody to pedberg
  • Type changed from unknown to data

comment:6 Changed 2 years ago by pedberg

  • Milestone changed from 28 to 29

comment:7 Changed 22 months ago by emmons

  • Milestone changed from 29 to upcoming
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.