Re: UCA and Russian letter Ё

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Fri, 21 Dec 2012 13:56:31 +0100

Leo Broukhis, Fri, 21 Dec 2012 01:31:18 -0800:
> In Russian, the difference between Е and Ё is primary at the beginning
> of a word as they are considered distinct letters of the alphabet, yet
> secondary in the middle of a word, as the dieresis over Ё is not
> mandatory.
>
> As an example, ель < ёлка, but тёлка < тель, see
> http://ru.wikisource.org/wiki/%d0%9e%d1%80%d1%84%d0%be%d0%b3%d1%80%d0%b0%d1%84%d0%b8%d1%87%d0%b5%d1%81%d0%ba%d0%b8%d0%b9_%d1%81%d0%bb%d0%be%d0%b2%d0%b0%d1%80%d1%8c_%d1%80%d1%83%d1%81%d1%81%d0%ba%d0%be%d0%b3%d0%be_%d1%8f%d0%b7%d1%8b%d0%ba%d0

You say that the difference is primary in the beginning of a word but
elsewhere secondary. And yes, that orthographic dictionary that you
link to above, looks as you describe.

However, in reality, the difference is secondary - if that is the right
word - even as the first letter in a word. Wikipedia has the following
example: едок > ёж > ездит.[1] And, for instance the word ёлка could
also be written елка.

Hence I would argue that the dictionary you linked to above considers
the difference to *always* be secondary. It is just that the dictionary
applies the sorting algorithm to a collection where the words that
begins with the letter Ё has been separated from words that begins on
the letter Е.

> A cursory scan of the UCA doesn't reveal if that's implementable, and
> experiments in a fairly fresh Linux Mint yield either
> ель < ёлка < тель < тёлка or ель < тель < тёлка < ёлка depending on
> the LANG setting (en_US works better than ru_RU).

(Both examples consider the difference primary, but the the last
example is incorrect as the ёлка follows after the тёлка - which is
incorrect from every angle (except from the angle of the number of the
letter inside Unicode.)

> Could someone tell if the UCA in its current form is able to support that?

Is there not a need for 3 kinds of sorting? Namely: a) Е/Ё as always
distinct letters, b) Е/Ё as always non-distinct letters, c) Е/Ё as
non-distinct letters except when used as the first letter. (Note that
the last variant would only be yield correct result on collections of
words where a first-letter Ё is guaranteed be rendered with a Ё. Thus,
if ёлка is written елка, then the result becomes incorrect.)

Linguistic PS: From the angle of the "color" of the sound, then Russian
Ё is the "light" version of Russian О. (Its predecessor was also a
digraph - "IO".) But from the angle of stress then, when the Ё looses
its stress, it alternates with Russian Е (since Е can both be with and
without stress, whereas Ё can only be with stress). The reason why Е/Ё
is often considered a secondary difference, is (I think) related to the
stress: But for in lexicons and dictionaries, then Russian texts
typically do not mark where the stress of a word is. The stress is
simply known by the reader/user.

[1] <http://en.wikipedia.org/wiki/%d0%81#Russian>

-- 
leif halvard silli
Received on Fri Dec 21 2012 - 07:00:22 CST

This archive was generated by hypermail 2.2.0 : Fri Dec 21 2012 - 07:00:24 CST