Eric Muller asked:
> Is it correct that the sequences U+x U+0360 U+y and U+x U+034F U+y
> U+0303 should display the same? Would it be worth putting some words
> about those situations in section 13.2 of PDUTR #28?
I think that that should be the case, given the current definitions.
In particular, if U+x = U+006E "n" and U+y = U+0067 "g", you would
get the following three possibilities for writing the Tagalog ng-tilde:
1. <U+006E, U+0360, U+0067>
2. <U+006E, U+FE22, U+0067, U+FE23>
3. <U+006E, U+034F, U+0067, U+0303>
1. uses the double-diacritic tilde, which nominally applies merely to
the U+006E, but would be designed to lay over the top of a following
base character on display.
2. uses the compatibility combining double-tilde halves. These occur
in legacy bibliographic data records. In principle, 2 should display
in the same way as 1, but would be recommended only for interoperating
with the legacy data.
3. uses the grapheme joiner to create a "grapheme cluster", which in
this case would be the digraph "ng". A rendering engine savvy to
grapheme cluster status should then attempt to apply a following
combining mark, in this case a regular combining tilde, to the entire
grapheme cluster, rather than simply to the preceding base character.
While these are three alternative ways of representing the "same thing",
we aren't talking about canonical equivalences here. 3 creates a
grapheme cluster (which could have implications for other processing),
while 1 and 2 do not. For example, if I added U+0301 (combining acute)
after each of the above sequences, 1 would put the acute on the "g"
(and might result in overlap with the right half of the double tilde);
2 would put the acute over the right-half tilde on the "g"; 3 should
put the acute midships over the stretched tilde applying to the digraph.
2 is used for interoperating with legacy
bibliographic data, while 1 and 2 are not. And there are quite likely
to be other small formatting differences between the three options. In the
real world it is unlikely that you will run into a "perfect" rendering
engine that would produce exactly the same image from each of the
sequences.
The combining grapheme joiner is the best answer that Unicode currently
has for the extensibility problem for unusual accent placements over
(or under) groups of letters, where the existing compatibility answers
(U+0360..U+0362 for double diacritics; U+FE20..U+FE23 for diacritic halves)
aren't sufficient. For example, it makes it possible to represent a
double breve or a double macron over (as seen in some American dictionary
orthographies) or a double (or triple) underline under (as seen in some
transliterations).
--Ken
This archive was generated by hypermail 2.1.2 : Thu Jan 03 2002 - 18:30:14 EST