[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11232(closed: fixed)

Opened 5 months ago

Last modified 6 weeks ago

Relationship between UTS #35 and UTS #18

Reported by: yoshito Owned by: mark
Component: other Data Locale:
Phase: spec-beta Review: andy
Weeks: Data Xpath:
Xref:

Description

I'm filing this ticket on behalf of Makoto Murata. He could not create a ticket in this system blocked by the spam checker (why??). He reached Mark and myself via e-mail.

=========

UTS #18 is a guideline document, although it defines detailed syntax and semantics. UTS #35 defines UnicodeSet notation, which is based on a subset of features found in UTS #18.
But UTS 35 also defines detailed syntax and semantics.

Does UTS #35 define a profile of UTS #18? In other words, is every syntactically correct description in the UnicodeSet notation usable by any conformant implementations of UTS #18? Or, is UTS #35 a standalone specification that provides syntax beyond UTS #35 and semantics slightly different from UTS #18?

Attachments

Change History

comment:1 Changed 5 months ago by yoshito

Mark's response:

Many regex have different conventions for syntax, so the syntax provided in UTS #18 is illustrative rather than a requirement. I believe that UnicodeSet in UTS #35 (and the implementation in ICU) does implement all of the suggested semantics in UTS #18 Levels 1&2 that are relevant to determining sets of characters. (Some clauses are not relevant to that, such as support of "\b" or line boundaries.)

There is one exception: http://unicode.org/reports/tr18/#RL2.6. That can be supported fairly easily in ICU by implementing a "hook", as is done on https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{name=/APPLE/}. But if you find any discrepancies, please let Andy and me know.

comment:2 Changed 5 months ago by yoshito

Murata-san's response:

I think that UnicodeSet notation in #35 should be defined
by normatively referencing UTS#18 and then
imposing restrictions rather than defining everything
from scratch. (A non-normative summary is
certainly useful.) In the current form, it is
too difficult to understand the relationship between
#35 and #18.

comment:3 Changed 5 months ago by mark

  • Status changed from new to accepted
  • Priority changed from assess to major
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 34
  • Owner changed from anybody to mark
  • type changed from unknown to spec

I don't think we can normatively reference notation that is just illustrative, but we can improve the situation.

comment:4 Changed 3 months ago by mark

  • Component changed from unknown to other

comment:5 Changed 3 months ago by mark

  • Phase changed from rc to spec-beta

comment:6 Changed 7 weeks ago by mark

  • Status changed from accepted to reviewing
  • Review set to andy

comment:7 Changed 7 weeks ago by andy

Review Comments:

Line 6024 looks like it was intended to be a note for reviewers, but is not tagged as such and renders as unchanged text in a browser.

<p>I believe that UnicodeSet in UTS #35 (and the implementation in ICU) ...


Html tidy flags a number of linty issues, although they aren't related to this change.

line 4913 column 21 - Warning: missing <td>
line 5062 column 93 - Warning: nested emphasis <em>
line 290 column 67 - Warning: <a> escaping malformed URI reference
line 2555 column 25 - Warning: <a> escaping malformed URI reference
line 2820 column 41 - Warning: <a> anchor "PRIVATE_USE" already defined
line 6107 column 397 - Warning: <a> converting backslash in URI to slash
line 1130 column 17 - Warning: trimming empty <p>
line 1858 column 17 - Warning: trimming empty <p>
line 2873 column 108 - Warning: trimming empty <strong>
line 5668 column 72 - Warning: trimming empty <i>
line 5668 column 69 - Warning: trimming empty <b>
line 2555 column 25 - Warning: <a> cannot copy name attribute to id

The new text itself looks good.

comment:8 Changed 7 weeks ago by andy

  • Status changed from reviewing to reviewfeedback

comment:9 Changed 6 weeks ago by mark

  • Status changed from reviewfeedback to reviewing

Removed bogus paragraph <p>I believe that UnicodeSet..., and made minor wording changes in sentence following.

comment:10 Changed 6 weeks ago by mark

FYI, didn't do the tidy fixes; we can do those after the release as discussed.

comment:11 Changed 6 weeks ago by andy

  • Status changed from reviewing to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.