[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #5707(closed task: fixed)

Opened 2 years ago

Last modified 20 months ago

document basic collation syntax characters

Reported by: markus Owned by: markus
Component: xxx-spec Version: svn
Load: Data Locale:
Phase: Review: emmons
Weeks: 0.2 Data Xpath:
Xref:

Description

Document the special characters in basic collation syntax. As far as I can tell from the code and various pieces of documentation, we have more or less the following. Something like this should go into the LDML spec.

Java 6 RuleBasedCollator syntax characters

[\u0021-\u002F \u003A-\u0040 \u005B-\u0060 \u007B-\u007E]

In initial position:

U+0021 ( ! ) EXCLAMATION MARK
    JDK: Turns on Thai/Lao vowel-consonant swapping.
    ICU: Ignored
U+0022 ( " ) QUOTATION MARK
U+0023 ( # ) NUMBER SIGN
    ICU: Starts a comment until the end of the line.
    TODO: unused, seems unnecessary, remove?
U+0024 ( $ ) DOLLAR SIGN
U+0025 ( % ) PERCENT SIGN
U+0026 ( & ) AMPERSAND
    Reset
U+0027 ( ' ) APOSTROPHE
U+0028 ( ( ) LEFT PARENTHESIS
U+0029 ( ) ) RIGHT PARENTHESIS
U+002A ( * ) ASTERISK
U+002B ( + ) PLUS SIGN
U+002C ( , ) COMMA
    Tertiary difference
    TODO: Add ,*
U+002D ( - ) HYPHEN-MINUS
U+002E ( . ) FULL STOP
    TODO: Propose for primary difference
U+002F ( / ) SOLIDUS
U+003A ( : ) COLON
U+003B ( ; ) SEMICOLON
    Secondary difference
    TODO: Add ;*
U+003C ( < ) LESS-THAN SIGN
    Primary difference
    ICU:
      << secondary difference
      <<< tertiary difference
      <* primary differences, compact syntax
      <<* secondary differences, compact syntax
      <<<* tertiary differences, compact syntax
U+003D ( = ) EQUALS SIGN
    No difference
    ICU:
      =* no differences, compact syntax
U+003E ( > ) GREATER-THAN SIGN
U+003F ( ? ) QUESTION MARK
U+0040 ( @ ) COMMERCIAL AT
    Turns on backwards sorting of accents (secondary differences).
U+005B ( [ ) LEFT SQUARE BRACKET
    ICU: starts options syntax (ends with ']')
U+005C ( \ ) REVERSE SOLIDUS
U+005D ( ] ) RIGHT SQUARE BRACKET
U+005E ( ^ ) CIRCUMFLEX ACCENT
U+005F ( _ ) LOW LINE
    Note: Not Pattern_Syntax
U+0060 ( ` ) GRAVE ACCENT
U+007B ( { ) LEFT CURLY BRACKET
U+007C ( | ) VERTICAL LINE
U+007D ( } ) RIGHT CURLY BRACKET
U+007E ( ~ ) TILDE

TODO: Add syntax for quaternary difference.

At start of reset position text:

U+005B ( [ ) LEFT SQUARE BRACKET
    ICU: starts name of special reset position (ends with ']')

At start of other text:

U+005B ( [ ) LEFT SQUARE BRACKET
    ICU: starts [variable top] for setting that

Text syntax:

U+0027 ( ' ) APOSTROPHE
    Quoting
U+005C ( \ ) REVERSE SOLIDUS
    ICU: single-character escape
U+002F ( / ) SOLIDUS
    ICU: separates tailoring string from expansion
U+007C ( | ) VERTICAL LINE
    ICU: separates prefix from tailoring string

In compact syntax:

U+002D ( - ) HYPHEN-MINUS
    ICU: a-z range abbreviation

Many places: Non-quoted/non-escaped Pattern_White_Space is ignored in many places, even inside of contractions etc.

TODO: Consider tightening a little, only ignoring Pattern_White_Space between syntax elements.

Attachments

Change History

comment:1 Changed 2 years ago by emmons

  • Owner changed from anybody to markus
  • Priority changed from assess to medium
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 23

comment:2 Changed 2 years ago by markus

  • Milestone changed from 23 to 24

comment:3 Changed 20 months ago by markus

  • Cc mark, yoshito, pedberg, emmons added
  • Status changed from assigned to accepted
  • Review set to emmons

comment:4 Changed 20 months ago by emmons

  • Status changed from accepted to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
The ticket will be disowned. The resolution will be deleted. Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.