This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Tue Apr 1, 2014
Name: Asmus Freytag
Opt Subject:UAX #9 (PRI 274)
Add as implementation note to UAX#9 During line breaking, if a line is broken at the location of a SHY, the text around the line break may change. A common case is the replacement of the invisible SHY by a visible HYPHEN, but see Section x.x in the Unicode Standard. For the purposes of the Bidi Algorithm, apply steps .. to .. after any substitutions have been made, using the directional classes for the substituted characters, instead of a single BN for the SHY character. [example] Note, no special action need be taken for a SHY character in the middle of a line, unless they are rendered as visible glyphs in a "show hidden character" mode. In the latter case, the recommendation would be to treat the visible symbol substituted for the SHY as having bidi class ON.
Date/Time: Mon Apr 21 20:36:33 CDT 2014
Name: James Clark
Report Type: Error Report
Opt Subject: Unclear wording in UAX#9, rule W6
Rule W6 in UAX#9 (http://www.unicode.org/reports/tr9/tr9-29.html#W6) says: "Otherwise, separators and terminators change to Other Neutral". It wasn't immediately clear to me whether "separators" here was intended type S (Segment Separators). I suspect it's not because the title of the section is "Resolving Weak Types" and type S is neutral rather than weak. I suggest this should be made explicit.
Date/Time: Tue Apr 22 13:52:26 CDT 2014
Name:Asmus Freytag
Report Type: Error Report
Opt Subject: Obfuscating language in BD16 and N0 of UAX#9
UAX#9 uses an unnecessarily involved algorithmic description for a paired bracket. This makes that part of the bidi algorithm difficult to understand and in particular authors that are not programmers will not be able to arrive at a proper prediction of which brackets will be handled correctly. This affects also text assembled programmatically. Accordingly, the unclear language should be replace as follows --------- BD16a A bracket pair is a pair of an opening paired bracket and a closing paired bracket characters such that the Bidi_Paired_Bracket property value of the former character or its canonical equivalent equals the latter character or its canonical equivalent. BD16b A resolved bracket pair is a bracket pair that has been been selected from among possible bracket pairs in an isolating run sequence. Note: for the PBA this selection is performed according to Rx (below). Rx For each isolated run sequence, bracket characters are selected into resolved bracket pairs as follows: Starting at the beginning of the run sequence, when the a closing bracket character is encountered, find the nearest preceding opening character that forms a bracket pair, but is not already part of a resolved bracket pair, and not ignored for bracket pair selection. If one exists, resolve the pair, and mark any enclosed opening brackets of any kind as not part of a bracket pair and ignored for further bracket pair selection. Otherwise, if no pair can be selected, mark the closing bracket as not part of a pair and ignored for further pair selection. Note: the outcome of Rx is a list of resolved pairs and their locations. Selected pairs can nest, but can't otherwise overlap. The rule prefers the closest pair for matching as opposed to attempting to select for the most hierarchical set of nested pairs. (See examples). ------------ What I have called Rx here, would become N0a with the part of NO that is the second bullet numbered N0b. I would move the existing examples from BD16 into the rules section, not leave them in the definitions as they are today. Rule N0 (second bullet) would change from: For each bracket-pair element in the list of pairs of text positions... to For each resolved bracket pair...
Date/Time: Fri Apr 25 09:56:00 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject:
This is an editing suggestion for UAX#9, it does not intend to change anything in the behavior or results of applying the UBA to bidirectional text. I agree with Asmus Freytag that the definitions in BD16 are too complicated and involve an algorithm as part of the definition. I suggest the following alternative text for BD16: A bracket pair is a pair of an opening paired bracket and a closing paired bracket characters within the same isolating run sequence, such that the Bidi_Paired_Bracket property value of the former character or its canonical equivalent equals the latter character or its canonical equivalent, and provided that a closing bracket is matched to the closest match candidate, disregarding any candidates that either already have a closer match, or are enclosed in a matched pair of other 2 bracket characters.
Date/Time: Fri Apr 25 10:01:07 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Unclear wording in paragraph 3.1.2 of UAX#9
These comments are of purely editorial nature, and do not intend to change anything in behavior or results of the UBA. The UBA has this sentence in paragraph 3.1.2 near its very end: As rule X10 will specify, an isolating run sequence is the unit to which the rules following it are applied, and the last character of one level run in the sequence is considered to be immediately followed by the first character of the next level run in the sequence during this phase of the algorithm. The "rules following it" part is a bad referent. I suggest to replace it with the following unambiguous reference: As rule X10 will specify, an isolating run sequence is the unit to which the rules following X10 are applied, ...
Date/Time: Fri Apr 25 10:06:44 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Usage of "current embedding level" in paragraph 3.3.2 of UAX#9
This change suggestion is of a purely editorial nature, and doesn't intend to change any behavior. In rule X6 of the UBA, we have this language: X6. For all types besides B, BN, RLE, LRE, RLO, LRO, PDF, RLI, LRI, FSI, and PDI: • Set the current character’s embedding level to the embedding level of the last entry on the directional status stack. • Whenever the directional override status of the last entry on the directional status stack is not neutral, reset the current character type according to the directional override status of the last entry on the directional status stack. In other words, if the directional override status of the last entry on the directional status stack is neutral, then characters retain their normal types: Arabic characters stay AL, Latin characters stay L, spaces stay WS, and so on. If the directional override status is right-to-left, then characters become R. If the directional override status is left-to-right, then characters become L. Note that the current embedding level is not changed by this rule. Note the last sentence. Its reference to the "current embedding level" is unclear and confusing, more so because the previous text mentions "the current character's embedding level", which _is_ changed by this rule. I believe the intent was to say that the embedding level of the last entry on the directional status stack is not changed by X6. If so, I suggest to say that explicitly.
Date/Time: Fri Apr 25 10:10:35 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Unclear language in rule X10 of UAX#9
This comment is of editorial nature and doesn't request any changes in behavior. Rule X10 of UAX#9 includes this bullet: Apply rules W1–W7, N0–N2, and I1–I2, in the order in which they appear below, to each of the isolating run sequences, applying one rule to all the characters in the sequence in the order in which they occur in the sequence before applying another rule to any part of the sequence. The order that one isolating run sequence is treated relative to another does not matter. This says nothing at all about the order of applying the rules W1-W7, N0-N2, and I1-I2 between the different isolates. I suggest the following more clear rewording: Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences. For each sequence, completely apply each rule in the order in which they appear below. The order that one isolating run sequence is treated relative to another does not matter.
Date/Time: Fri Apr 25 10:18:44 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Ambiguous language in rule N0 of UAX#9
This comment is of purely editorial nature, it does not require any changes in behavior. Rule N0 of the UBA says, among other things: For each bracket-pair element in the list of pairs of text positions a.Inspect the bidirectional types of the characters enclosed within the bracket pair. b.If any strong type (either L or R) matching the embedding direction is found, set the type for both brackets in the pair to match the embedding direction. But there's no explanation what is meant by matching a string type, L or R, to the embedding direction. I think this requires some specific definition to become clear. Likewise, table 3 in paragraph 3.1.4 talks about "text ordering [...] that matches the embedding level direction (even or odd)", but never explains what such a match means. I suggest to tell in both these places that L matches even embedding direction, whereas R matches odd embeddings.
Date/Time: Fri Apr 25 10:22:57 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Obfuscated definition of "isolating run sequence" in UAX#9
This change suggestion is of a purely editorial nature. UAX#9 defines an "isolating run sequence" in BD13 in a way that is unnecessarily complex and hard to understand. In a nutshell, little is said except an algorithm to compute the set of all isolating run sequences for a paragraph. I suggest the following formal definition of an isolating run sequence to be included in BD13: An isolating run sequence is the maximal sequence of level runs of the same embedding level that can be obtained by removing all the characters between an isolate initiator and its matching PDI (or paragraph end, if there is no matching PDI) within those level runs.