From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 18 2011 - 11:00:43 CDT
Reread the stability rules. There's no chance this will ever happen
now, because these accented letters are already encoded now, even if
this requires pairs of characters.
Fonts containing Cyrillic letters along with accents already exist
(and msot of them also include Latin letters). Most of the precombined
letters have been included long time ago at the early stage of ISO
10646 and Unicode encoding, before reaching a highly wanted stability
status. They were there only to help the conversion from lots of
legacy encodings.
Before you think that these letters should be encoded, you'll first
have to demonstrate that there exists a need to provide full-roundtrip
compatibility with a legacy supporting standard. But since ISO 10646
and Unicode have been stabilized, there are no more development in
non-UCS-based encodings in any country as a standard. All ISO members
have agreed on that, and the industry now only wants the UCS.
The standard normalization forms (NFC/NFD) have been created only to
support the conversion between these precomposed or decomposed forms
considered equivalent, but normalized strings MUST remain stable. What
you want here would have the effect of creating disunification and
breaking the canonical equivalence, so you would have to justify the
desunification by proving that this makes a semantic or visual
difference when encoding a separate (and necessarily distinct)
precomposed letter.
There are so many applications using the UCS and that depend on it,
that breaking a stability rule would have a considerable impact. It is
not even needed to do that, because it is much simpler to use the
standard as it is, even if for now there are not a lot of fonts to
support the combinations of cyrillic letters with accents (but it's
perfectly pssible to do that, and this has already been done
extensively, including for the Latin script).
It will be shorter and easier to develop or extend existing Cyrillic
fonts (or extending the rendering engines) than breaking the standard
which already works well and reliably in editors. If what you want is
being able to select full letters in an editor, this is just a matter
of user proferences in your editor (or choosing the editor that
implements its as its default), but not a problem of the encoding.
Those editors already exist.
And anyway your request has been discussed many times in the past
years. Don't view it as a limitation, in fact this is a feature of the
standard which has saved lots of time. Many standard softwares
(including free and open-source ones, and general purpose Unicode
utility libraries) understand and manage perfectly those characters
encoded as sequences of characters.
-- Philippe.
2011/5/18 Plamen Tanovski <pgt@tanovski.de>:
> Hi,
>
> while almost every possible accented latin vowel has its own slot,
> cyrillic accented vowels are missing in the unicode tables, except for
> two of them. I think, using combined diacritics is not an option, for
> at least two reasons: 1. text editing and processing is very difficult
> and erroneous; one has to pay attention to two characters; for
> example: if the vowel is deleted, the accent goes to the previous
> sign, etc.; 2. quality typesetting is almost impossible, because the
> font has to provide all the data (contextual alternatives and mark
> positioning) for the right placing of the accent, and I suppose
> 99,9% of the cyr. fonts don't provide this information.
>
> So it is very urgent to propose the including of accented cyrillic
> vowels in the unicode. I think the combinations with grave and acute
> accents are enough, so we are talking here about 40 slots after all.
>
> best regards
>
> --
> Plamen Tanovski
> Tanovski & Partners Publishing Services
> www.megensatz.de
> Tel. +49 341 3 08 57 60
>
>
>
This archive was generated by hypermail 2.1.5 : Wed May 18 2011 - 11:03:12 CDT