Armenian Mijaket (Armenian colon)
Philippe Verdy via Unicode
unicode at unicode.org
Tue Dec 5 15:08:39 CST 2017
In fact I would also remove the suggested misleading (non normative) note
in NamesList.txt about the use of the ONE LEADER DOT (it is jsut one of the
possible fallbacks but it has wrong properties for encoiding plaintext, it
is only useful as a rendering fallback, but is not even useful for that
because almsot no font map this character, as leader dots are preferably
rendered another way, by drawing a dotted line; one some text renderers may
use the leader dot only when they need to transform a leader space into a
botrted line and they need a glyph for that, but note that they'll also
need to control the spacing, margins and will probably always put it on the
baseline like regular full stops)
A better fallback is the middle dot (but with additional thin space around
it). Still for the semantics, and because we should not have to use such
renndering fallbacks for composing plain texts (imagine what we want to
enter in a database of texts or in translation engines that don't know and
should have to worry about the fonts, font styles or metrics, when here we
need a clear semantic distinction of the mikajet (colon or semi-colon
articulating two phrases in the same sentence, or at end of an introductory
sentence followed by one value or a list of Armenian words, itself
terminated by an Armenian full stop U+0589).
You'll note that on Wikipedia, the ArmSCII table at top of the page was
composed and rendered (in LaTeX) with the middle dot and is clearly
distinguished from the ASCII full stop and the Armenian full stop. You will
find no place there about the ONE DOT LEADER.
This is espacially important because today Armenian will be written using
eithern "modern" (ASCII) punctuations (like in English with colons,
semicolons, and full stops), or traditional punctuation. And it cannot be
predicted in which context the transalted texts will be used (modern/ASCII
or traditional) so we have an ambiguity about how to translate and
represent colons/semicolons and full stops.
The Armenian full stop is clearly encoded. The Armenian [semi]colon is not
and we only have fallbacks. So we need the "mikajet" at U+0588 (unallocated
and jut before the distinctive U+0589 Armenian full stop) is the best place.
Even for the Unicode represenative chart, you'll note that the characters
are slanted including the punctuation and the dots become ovals. Various
Armenian texts use square dots (apparently drawn as a small nearly vertical
stroke with a pencil or plum).
This will leave the renderers choosing how to rendere the two Armenian
punctuations (either traditional, or modern) and will preserve the
semantics of text without conflicting with other rendering options (for the
leaders in TOCs or tabular data, which may eventually use U+2024 with some
rare fonts specific to the renderer engine and its own typographical
engine, if it ever needs a font for its needed glyphs, but zven in that
case this internal fonts will not need to be Unicode encoded, it will just
be a collection of glyphs for the intended rendering effect and styles it
wants to support).
For now the immediate real need is for fully translating interfaces in
applications and allowing them to support either a "modern" style
(English/ASCII punctuations) or "traditional" style. No fallback characters
should be encoded in these texts so that no confusion will arise if ever
one uses both the real Armenian full stop (two dots) and a fallback for the
distinctive missing mikajet (single dot, to distinguish also from leaders
and decimal separators in numbers or abbreviation dots). The new encoded
mikajet may include a note suggesting the use of the MIDDLE DOT as a
2017-12-05 21:35 GMT+01:00 Asmus Freytag via Unicode <unicode at unicode.org>:
> On 12/5/2017 11:28 AM, Philippe Verdy via Unicode wrote:
> U+2024 is not supported in any fonts I have loaded. A websearch of mijaket
> gives nothing.
> U+20224 is used as a "leader dot", and does not match the expected metrics
> (it is certainly not a mijaket, it should be more like U+0589, i.e. as a
> bold parallelogram, and not a thin leader dot).
> Leader dots are NOT used as real punctuation, they are presentational, for
> example in TOC (table of contents), where they are aligned in arbitrarily
> long rows.
> The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not
> normative and in fact it is wrong in my opinion.
> The mijaket (Armenian colon) should be encoded (preferably at U+0588 in
> the Armenian block) as it also has to be distinguisdhed from leader dots in
> Armenian TOC, exactly like the vertsaket was distinguished at U+0589.
> Well, unless someone (you?) writes a proposal to that effect....
> (I don't know the history of this particular "unification" but on the face
> of it would share your concern that unifying something with a very specific
> functionality and metrics, leader dots, with ordinary script-specific
> punctuation is not helpful - unless it can be shown that this unification
> is widely supported in practice. However, if your claim that 2024 is
> unsupported is correct, that would strengthen the case for reconsidering
> this; however the case would have to be made in a formal proposal first).
> 2017-12-05 19:59 GMT+01:00 S. Gilles <sgilles at math.umd.edu>:
>> On 2017-12-05T18:44:05+0100, Philippe Verdy via Unicode wrote:
>> > The Armenian script has its own distinctive punctuation (vertsaket) for
>> > standard full stop at end of sentence (whose glyph looks very much like
>> > Basic Latin/ASCII colon, however slighly more bold and slanted and whose
>> > dots are rectangular). It is encoded at U+0589. And used in traditional
>> > texts instead of the "modern" full stop.
>> > But Armenian also has its own distinctive puctuation (mijaket) for the
>> > introductory colon between two phrases of the same sentence (whose glyph
>> > looks very much like the Basic Latin/ASCII full stop). It is not encoded
>> > and I don't like using the ASCII full stop where it causes confusion.
>> > Where is the Armenian distinctive mijaket? Shouldn't it be encoded at
>> > U+0588?
>> Off-list because I generally don't know what I'm talking about, but
>> grepping NamesList.txt for ‘mijaket’ gives U+2024. If this isn't
>> what you're looking for, my apologies.
>> S. Gilles
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode