I note that there's a confusion in the introduction of UAX#9:
"On web pages, the explicit directional formatting characters (of all types
– embedding, override, and isolate) should be replaced by using the dir
attribute and the elements BDI and BDO."
The suggested replacements do not match the order of the listed types.
- embedding (with LRE/PDF or RLO/PDF) just uses the dir="ltr/rtl" attribute
on any element (except BDI and BDO)
- override (with LRO/PDF or RLO/PDF) uses BDO with
the dir="ltr/rtl" attribute
- explicit isolate (with LRI/PDI or RLI/PDI) uses BDI with
the dir="ltr/rtl" attribute
- "automatic" isolate (with FSI/PDI) uses BDI without any dir attribute
Two implicit directional characters (LRM or RLM) are also convertible to
overrides as an empty BDO element with dir="ltr/rtl". Only ALM has no
equivalent.
---- But for most cases, HTML documents should simply not use embedding or override at all, isolates with BDI are much prefered and are in fact simpler to manage than what section 6.4 suggests (this suggestion using RLM or LRM before the separating punctuation does not work reliably as it implies that you can predict the implicit reading direction of the whole list, whose ordering is normally depending on the context or the document containing the list. It is much simpler to isolate each list element and then pack the list using the unmarked punctuations. An example of this is found on International wikis thart must display some inter-language bar to navigate to other translated versions of the same page: the same template will be used on all pages, and the list of languages is not predicted and may evolve over time, containing LTR or RTL language names in unpredictable occurences anywhere in the list, formatted with the same separatorwithin a single inline span in a paragraph starting by a translatable introduction heading, and you cannot predict which language name will occur after that separator. Using BDI (without even needing any dir=rtl/trl") or FSI/PDI to isolate each language name will work much better than using uncondiionnaly some static RLM or LRM before the separating punctuation (note that there's no such punctuation at start of the list, so the ordering of the first element is not set correctly unless there's a RLM or LRM also before that first element, which may then render incorrectly). The best and most flexible solution is to use "automatic" isolates for each list item (with FSI/PDI in plain-text documents, or BDI elements without any dir attribute in HTML documents). The same is also true when inserting quotations (including when giving the title of another document, or the name of an author) or for formatting translatable text containing "placeholder variables" whose content will be generated separately. BDI elements without any dir attribute can efficiently replace SPAN elements, and can still have their own optional formatting styles (colors, font families, font size, line height, font styles and weight, visual effects...), or title attributes (to give hints to readers about what the isolate value will be used for), or identifier (useful to generate stable anchors that work across all translations of the document). There are also CSS styles using unicode-bidi properties, but they should be completely avoided in HTML (these styles will be better infered from BDI elements) 2016-09-19 2:16 GMT+02:00 Ken Whistler <kenwhistler_at_att.net>: > > On 9/17/2016 10:26 AM, Deepak Jois wrote: > >> I now need to make the updates to support the changes in Unicode 8.0, >> and I am finding it a bit hard to grok the changes in C at a glance. >> >> > The UBA 7.0 --> UBA 8.0 changes were rather subtle. They did not change > much about the gross behavior of the algorithm, but there were some fixes > for edge cases in a couple rules. Also, the specification of behavior on > stack overflow became exact, rather than implementation-defined. > > The C bidi reference code is a bit complicated, because it supports *all* > UBA versions from 6.2 through 8.0, which means it has to special case rule > processing by versions when the specification itself changes. > > If you diff the 7.0 version of brrule.c and the 8.0 version of brrule.c > you'll find the heart of the differences there, along with explanations in > comments for the changes. The new function br_SetBracketPairBC handles an > edge case for combining marks following a bracket. The code using a new > flag testONisNotRequired deals with an edge case for the current Bidi_Class > of brackets being tested for pairing. Changes in br_PushBracketStack are > involved in the need to keep the pre-8.0 behavior as it was for earlier > versions of bidiref, but allowing for explicit behavior for stack overflow > for 8.0. > > It may also help to compare the 7.0 and 8.0 versions of UAX #9 itself, so > you can see the textual changes in the specification of the rules. Try > diffing: > > http://www.unicode.org/reports/tr9/tr9-31.html (7.0) > http://www.unicode.org/reports/tr9/tr9-33.html (8.0) > > The significant changes there are in BD11, BD14, BD15, BD16, and in rules > X5a, X5b, X6a, and N0. (The rest of the changes in the updated document are > cosmetic.) > > --Ken > >Received on Mon Sep 19 2016 - 01:29:59 CDT
This archive was generated by hypermail 2.2.0 : Mon Sep 19 2016 - 01:29:59 CDT