2013-10-20 2:38, Richard Wordingham wrote:
> Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain
> text?
Well, is <h1>hello<h1> plain text? The answer is that any string of 
characters may be considered as plain text and any string of characters 
may be treated as rich text according to some conventions.
> If so, how many dotted circles should appear?
Possibly none. An implementation need not support any particular 
collection of characters. But an implementation that supports U+25CC 
must treat it as a spacing character, and an implementation that 
supports e.g. U+0300 must treat it as a combining mark. So if the 
implemention is capable of visually rendering them, it shall render 
U+25CC U+0300 as a dotted circle with an acute accent above it. In this 
case, exactly one dotted circle should appear, then.
Implementations often have bugs in dealing with combinining mark. This 
may depend on the rendering software, or on the font.
> If the sequence is not plain text, what mark-up notations are
> available to control the number of dotted circles produced?  I
> am particularly interested in notation for HTML, e.g. via a style
> sheet. Should the sequence instead be treated as a graphic?
I don’t understand these questions. If the sequence is treated as other 
than plain text, then the results depend on the specific “rich text” or 
other conventiones applied.
> This question is prompted by a confused discussion of what the notation
> <U+25CC, U+0E31 THAI CHARACTER MAI HAN-AKAT, U+25CC> on a web page
> meant.
What it means is a different issue. U+25CC is a symbol that can be used 
in a variety of meanings. I don’t think it means anything specific to 
most people, unless a definition is given. U+0E31 is a Thai vowel sign, 
and I don’t think any meaning in general has been assigned to it when 
applied to something else than a Thai letter.
The rendering of the sequence is a different matter. Not surprisingly, 
tests on IE 10 show varying results. Using my test page
http://www.cs.tut.fi/~jkorpela/listfonts1.html
that renders, on IE, a given string in all the fonts available in the 
system, I noticed that on my system, only SunExt-A and Unifont result in 
correct rendering. Using Arial Unicode MS, the rendering is correct 
except for the circles being dashed, and I think this is incorrect for 
U+25CC, as it violates the identity of the character as a dotted circle. 
A few other fonts contain the characters too, but the renderings have 
three similar dotted rings, with the Thai diacritic above the middle one 
or (in FreeSerif and Quivira) between the 2nd and 3rd. – On Chrome, 
Safari, and Firefox, the results are similar, except that Chrome shows 
the string as broken even when Arial Unicode MS is declared.
> The confusion was caused because some of us saw two dashed
> circles and others saw three dashed circles (one for each character)
> when viewing the web page.
The implementations that show three dotted circles are non-conforming. 
Showing three dashed circles would be even more non-conforming.
If the purpose is to display the combining diacritic the same way as in 
the code charts in the standard, i.e. with a dotted symbol appearing as 
generically showing the place of a base character, then I’m afraid the 
approach does not work in general. It should work, in the sense that 
conforming implementations would render it the desired way if they 
support the characters in rendering, but web browsers just don’t conform.
What you could do in a web page is to put U+00A0 U+25CC in one element 
and U+0E31 in another and position the elements in the same place, set 
to have the same width and to be horizontally centered. But I’m afraid 
this would be off-topic here and could involve some nasty details.
Yucca
Received on Sun Oct 20 2013 - 03:50:57 CDT
This archive was generated by hypermail 2.2.0 : Sun Oct 20 2013 - 03:51:04 CDT