Re: Rendering Raised FULL STOP between Digits

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 11 Mar 2013 05:27:35 +0100

2013/3/10 Richard Wordingham <richard.wordingham_at_ntlworld.com>:
> On Sun, 10 Mar 2013 17:22:05 +0200
> "Jukka K. Korpela" <jkorpela_at_cs.tut.fi> wrote:
>
>> 2013-03-10 4:57, Asmus Freytag wrote:
>>
>> >> 'The Lancet' reportedly insists on the use of the raised decimal
>> >> point
>> […
>> > That's sensible advice, in a way, because B7 is in 8859-1 and
>> > therefore supported in a huge variety of fonts, for practical
>> > purposes, the coverage among non-decorative text fonts is pretty
>> > near universal.
>>
>> This probably implies that most people who wish or need to use a
>> raised dot will keep using B7. And that’s fine for most purposes.
>>
>> A new character would *allow* people to use raised dot, for which
>> fonts could contain suitable renderings that are independent of the
>> demands of B7 (especially due to its intended primary use as middle
>> dot in some languages). This would mostly be relevant to accurate
>> coding of old documents rather than to everyday needs of British
>> writers.
>
> The Greek punctuation mark U+00B7 (an upper dot) is also under some
> stress. If it aligns with the top of the preceding character, as
> apparently it should for Greek letters, that causes some strain for
> rendering "MS-DOS<ano teleia>" or "Windows 7<ano teleia>" in Greek
> text.
>
> The existence of unambiguous leading and trailing decimal points argue
> for the decimal point having a bidi class EN! Does anyone use ano
> teleia for right-to-left text? Perhaps one will just have to protect
> leading and trailing decimal points with directionality controls, in
> which case a bidi class of ES will suffice for decimal points flanked by
> digits.
>
> The line-breaking class of U+00B7 is currently AL (alphabetic); a
> decimal point needs NU (numeric), which is slightly more restrictive.
> Making NU the line breaking class of U+00B7 would not hurt.
>
> The value of Word_Break for the decimal point should be Numeric, like
> its forbear U+066B ARABIC DECIMAL SEPARATOR. U+00B7 has the Word_Break
> value MidLetter. A value of MidNumLet would work for U+00B7, and would
> handle decimal points between digits. This separates leading and
> trailing decimal points from the rest of the number, but is no worse
> than the current situation with FULL STOP. Leading U+00B7 could be
> dealt with by a special rule. For trailing decimal points, arguable a
> defective notation, the only completely robust solution is to rely on
> the lack of a word break being marked manually. I believe we ought to
> add general rules of the form
>
> Any × U+2060
> U+2060 × Any
>
>> According to “A history of mathematical notations” by Florian Cajori,
>> paragraph 286, the vertical position of the dot used as decimal
>> separator varied a lot in the 19th century. It varied from just a
>> little above the baseline up to the x-height and above, even to the
>> top of lining figures! I would expect that 20th century typography
>> had similar variation.
>
> I haven't seen any such variation; the late 20th century seems to have
> stabilised the form. Some of the variation may be due to changes in the
> placement of the digits.
>
>> Of course, Unicode cannot encode all the possible vertical positions
>> (and sizes) of a raised dot. Such things would be normal glyph
>> variation, for stylistic or other reasons. The point is that no such
>> variation is realistic for B7.
>
> If we unify U+00B7's three possible roles of (a) digraph breaker, (b)
> ano teleia and (c) decimal point, we could have the following scheme:
>
> (1) Before digit, use decimal point glyph;
> (2) Else before letter, use digraph breaker glyph;

Note that this case 2 includes Catalan where it is more than just a
digraph breaker (between two l/L), and where it plays a role similar
to a diacritic for the letter (l/L) before it. This complicates things
a bit when the letter before it is a capital L, because it will be
typically be kerned into it (ecept possibly in cursive decorated
fonts). Your algorithm may be in fact part of substitution rules
implemented in fonts.

But as a digraph breaker in Catalan, it also plays a role in line
breakers (where the dot remains at end of line and will not be
followed by a visible hyphen. In which case there's an extra
complication : line breaks may already be part of the encoded text and
you need another case:

> (2b) Else before end of line, use digraph breaker glyphs.

Can this extra case work with Greek's use as ano teleia ?
Received on Sun Mar 10 2013 - 23:33:20 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 10 2013 - 23:33:22 CDT