From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Aug 03 2005 - 14:13:56 CDT
> According to sources (1) and (2), Latvian used some letters
> with diagonal stroke in its 19th century orthography. These are
> G,g, K,k, L,l, N,n, R,r, S,s, long s.
Equivalent to the modern orthography:
G-cedilla (0122, 0123), K-cedilla (0136, 0137), L-cedilla (013B, 013C),
N-cedilla (0145, 0146), R-cedilla (0156, 0157), and presumably
S- and Z-hacek.
> See attached scans from (1) p.231 (Faulmann-p231.png) and from (2),
> p.595 (Allen-p595.png).
> Of these, only L,l are encoded in Unicode 4.1 (unless I overlooked
> something; I doubt that G,g with diagonal stroke can be treated as
> font variants of U+01E4, U+01E5).
Yes. It would be unrelated to that, I think.
>
> Is this sufficient evidence for encoding the missing ones?
Sufficient evidence of the existence of the letterforms, sure.
Sufficient reason for encoding as characters, no.
> (As I have not any special knowledge of Latvian, I don't consider me
> qualified to write a proposal).
>
Karl continued:
> Of course. But I remember somehow that combinations of letters with
> things which cross or cover them will be treated as new encodeable characters,
> unlike combinations of letters with diacritics which are attached to the
> letter (like ogonek) or do not touch the letter at all (like macron).
As *potentially* encodable characters. The existence of all kinds of
overstruck forms of Latin letters doesn't automatically give them
a free pass into the standard. The *need* for encoding them still
has to be presented.
> Is this assumption correct? If yes, is this documented somewhere?
In UTC decisions. In particular, the recent 3 additions of slash
overstruck letters (A-slash, C-slash, T-slash) were justified on
the basis of their use in a contemporary orthography (of the
Sencoten language). In such cases the likelihood of font glitching
with attempts to compose glyphs on the fly is more problematical
for users than in the case of scholars working on digital representations
of historic texts in obsolete orthographies. I think there is
an arguable difference in the needs factor here.
For the representation of historic texts in obsolete orthographies,
there is clearly the need to be able to represent the plain text
content, but I think that is already covered, as Chris Jacobs
suggested, by the existence of the overstruck combining marks
(U+0338 is most appropriate for the Latvian case, I think).
Specialized fonts may be needed to render such texts in fine
detail, but that was *already* going to be true, since a
lot of these texts for Latvian (Lettisch) were printed in
Fraktur style, and would need special glyph treatment for the
overstruck forms in *any* case.
Redactors of 18th and 19th century Lettisch texts have a couple
of options, it seems to me. On the one hand, they can simply
be represented using the modern orthography -- which would probably
be of most use to the readers of the texts. It is not beyond
the realm of possibility in such an approach to develop a
special display font which would simply map the modern characters
to glyphs for the struck-through letterforms to display the
text roughly as it was originally printed. There might be some
issues with the long-s forms, which don't seem to map one-to-one,
but I'm sure smart people can figure out the context rules for that.
The other choice is to represent the context of the Lettisch texts
in terms of the letters used *directly* -- namely encoding with
<g, combining-slash-overlay> sequences, and so on. This would be
more true to the surficial text context. And again, a specialized
display font could be built that would dispaly the text as it
would be read in the modern orthography.
Some of these tasks might be a little easier if single code
points were encoded for each of the old orthography slashed
characters, but all it would amount to would be eliminating a
few instances of 2-1 mappings in the processing. All the rest
of the editorial preparation and presentation issues would
end up being about the same.
In short, I don't see a compelling needs case demonstrated
here, yet.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Aug 03 2005 - 14:14:58 CDT