SC2/WG2 N2075R
Proposal to add Lithuanian Accented Letters to ISO/IEC 10646-1
Table of Contents
1. Official:
1.2. Graphic Symbols and Names
2. Rationale:
2.1.2. Extended alphabet (with accented letters)
2.2. 8-bit single-byte coding ( National standard code tables)
2.3. Multiple-Octet coding in ISO/IEC 10646-1 (UCS codes)
Lithuanian Standards Board: Kosciuskos 30, LT-2600 Vilnius, Lithuania
Phone: + 370-2-70 93 60, fax: +370-2-22 62
52
Author and contact person: Vladas Tumasonis (Vilnius University and Lithuanian Standards Board)
E-mail: mailto:vladas.tumasonis@maf.vu.lt?subject=Lithuanian Accented Letters
Phone: +370-2-36 60 35, fax: +370-2-25 15 85
1. Official:
1.1. Proposal Request Form
Please fill Sections A, B and C below. Section D will
be filled by SC 2/WG 2.
For instructions and guidance for filling in the form please see the document " Principles and Procedures for Allocation of New Characters and Scripts" (http://www.dkuug.dk/JTC1/SC2/WG2/prot)
1. Title: Addition
of Lithuanian Accented Letters
2. Requester's
name: Lithuanian Standards Board (LST)
3. Requester
type (Member body/Liaison/Individual contribution): Correspondent Member
4. Submission
date: 1999-08-15
5. Requester's
reference (if applicable):
6. This is a
complete proposal.
1. The proposal
is for addition of character(s) to an existing block.
Name of the existing block: LATIN EXTENDED-B
2. Number of
characters in proposal: 35
3. Proposed
category (see section II, Character Categories): A
4. Proposed
Level of Implementation (see clause 15, ISO/IEC 10646-1): 1
Is a rationale provided for the choice? If Yes, reference:
5. Is a
repertoire including character names provided?: Yes
a. If YES, are
the names in accordance with the 'character naming guidelines' in Annex K of ISO/IEC 10646-1? Yes
b. Are the character shapes attached in a reviewable form? Yes
6. Who will
provide the appropriate computerized font (ordered preference: True Type,
PostScript or 96x96 bit-mapped format) for publishing the standard? True
Type; Fotonija UAB, Vilnius, Lithuania
If available now, identify source(s) for the font (include address, e-mail,
ftp-site, etc.) and indicate the tools used: Mr. Virginijus Dadurkevicius;
dadurka@fotonija.com
7. References:
a. Are references (to other character sets, dictionaries, descriptive texts
etc.) provided? Yes
b. Are published
examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached? Yes
8. Special encoding issues:
Does the
proposal address other aspects of character data processing (if applicable) such
as input, presentation, sorting, searching, indexing, transliteration etc. (if
yes please enclose information): No
1. Has this
proposal for addition of character(s) been submitted before? No
If YES explain
2. Has contact
been made to members of the user community (for example: National Body, user
groups of the script or characters, other experts, etc.)?
If YES, with
whom?
If YES, available relevant documents?
3. Information
on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included? Yes
Reference:
4. The context
of use for the proposed characters (type of use; common or rare) Common
Reference:
5. Are the
proposed characters in current use by the user community? Yes
If YES, where? Reference: In Lithuania
6. After giving
due considerations to the principles in N 2002 must the proposed
characters be entirely in the BMP? Yes
If YES, is a rationale provided?
If YES, reference:
7. Should the
proposed characters be kept together in a contiguous range (rather than
being scattered)? Can be scattered
8. Can any of
the proposed characters be considered a presentation form of an existing
character or character sequence? Not existing characters, but they are fully
composed forms of glyphs that can be represented as a composite sequence
If YES, is a rationale for its inclusion provided? Yes
If YES, reference: Is enclosed
9. Can any of
the proposed character(s) be considered to be similar (in appearance or
function) to an existing character? No
If YES, is a rationale for its inclusion provided?
If YES, reference:
10. Does the
proposal include use of combining characters and/or use of composite sequences (see clause 4.11 and 4.13 in ISO/IEC
10646-1)? No
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images (graphic
symbols) provided? No
If YES, reference:
11. Does the
proposal contain characters with any special properties such as control
function or similar semantics? No
If YES, describe in detail (include attachment if necessary)
1. Relevant SC
2/WG 2 document numbers:
2. Status (list
of meeting number and corresponding action or disposition):
3. Additional
contact to user communities, liaison organizations etc:
4. Assigned
category and assigned priority/time frame:
1.2. Graphic Symbols and Names
Number
|
Graphic Symbol |
Name
|
Remarks |
1 |
|
LATIN
CAPITAL LETTER A WITH OGONEK AND ACUTE |
|
2 |
|
LATIN
SMALL LETTER A WITH OGONEK AND ACUTE |
|
3 |
|
LATIN
CAPITAL LETTER A WITH OGONEK AND TILDE |
|
4 |
|
LATIN
SMALL LETTER A WITH OGONEK AND TILDE |
|
5 |
|
LATIN
CAPITAL LETTER E WITH OGONEK AND ACUTE |
|
6 |
|
LATIN
SMALL LETTER E WITH OGONEK AND ACUTE |
|
7 |
|
LATIN
CAPITAL LETTER E WITH OGONEK AND TILDE |
|
8 |
|
LATIN
SMALL LETTER E WITH OGONEK AND TILDE |
|
9 |
|
LATIN
CAPITAL LETTER E WITH DOT ABOVE AND ACUTE |
|
10 |
|
LATIN
SMALL LETTER E WITH DOT ABOVE AND ACUTE |
|
11 |
|
LATIN
CAPITAL LETTER E WITH DOT ABOVE AND TILDE |
|
12 |
|
LATIN
SMALL LETTER E WITH DOT ABOVE AND TILDE |
|
13 |
|
LATIN
SMALL LETTER I WITH DOT ABOVE AND GRAVE |
Name ? |
14 |
|
LATIN
SMALL LETTER I WITH DOT ABOVE AND ACUTE |
Name ? |
15 |
|
LATIN
SMALL LETTER I WITH DOT ABOVE AND TILDE |
Name ? |
16 |
|
LATIN
CAPITAL LETTER I WITH OGONEK AND ACUTE |
|
17 |
|
LATIN
SMALL LETTER I WITH OGONEK AND DOT ABOVE AND ACUTE |
Name ? |
18 |
|
LATIN
CAPITAL LETTER I WITH OGONEK AND TILDE |
|
19 |
|
LATIN
SMALL LETTER I WITH OGONEK AND DOT ABOVE AND TILDE |
Name ? |
20 |
|
LATIN
CAPITAL LETTER J WITH TILDE |
|
21 |
|
LATIN
SMALL LETTER J WITH TILDE |
|
22 |
|
LATIN
CAPITAL LETTER L WITH TILDE |
|
23 |
|
LATIN
SMALL LETTER L WITH TILDE |
|
24 |
|
LATIN
CAPITAL LETTER M WITH TILDE |
|
25 |
|
LATIN
SMALL LETTER M WITH TILDE |
|
26 |
|
LATIN
CAPITAL LETTER R WITH TILDE |
|
27 |
|
LATIN
SMALL LETTER R WITH TILDE |
|
28 |
|
LATIN
CAPITAL LETTER U WITH OGONEK AND ACUTE |
|
29 |
|
LATIN
SMALL LETTER U WITH OGONEK AND ACUTE |
|
30 |
|
LATIN
CAPITAL LETTER U WITH OGONEK AND TILDE |
|
31 |
|
LATIN
SMALL LETTER U WITH OGONEK AND TILDE |
|
32 |
|
LATIN
CAPITAL LETTER U WITH MACRON AND ACUTE |
|
33 |
|
LATIN
SMALL LETTER U WITH MACRON AND ACUTE |
|
34 |
|
LATIN
CAPITAL LETTER U WITH MACRON AND TILDE |
|
35 |
|
LATIN
SMALL LETTER U WITH MACRON AND TILDE |
|
2. Rationale:
2.1. Lithuanian Letters
2.1.1. Main alphabet
Lithuanian by its grammatical structure is one of the most ancient languages of living Indo-European languages. It is spoken approximately by 5 millions people and is delivered at many Universities all over the world for linguistic studies.
The main Lithuanian alphabet consists at the Latin alphabet (excluding Q, q, W, w, X, x) with extra 18 letters with diacritics (9 capital and 9 small):
These letters are included in 8-bit single-byte coded character sets (ISO/IEC 8859-13, MS CP 1257, IBM CP 775, etc.). Thus there are no problems to use them.
2.1.2. Extended alphabet (with accented letters)
Lithuanian has a free word stress: stress may fall on every syllable of the word. it performs at least two functions. Its constitutive function manifests itself in distinguishing word from a combination of words, cf.:
The second function of word stress is the distinctive function, which distinguishes otherwise identical words by the place where the stress falls, e.g.:
For the word stressing (or accenting) there are three accent marks (or diacritical marks in ISO terms): grave accent, acute accent and tilde. The position of the stress depends on the stress pattern (or accentual paradigm) of the word and its morphological structure (see examples above).
Word stress is expressed by the means of accented letters.
There are 68 accented letters in the Lithuanian language:
The accented letters together with main letters comprise the extended alphabet.
Usage of accented letters goes back to the first Lithuanian writings. The first Lithuanian books were accented, e.g. "Kathechismas" (1595) and "Postilla catholicka" (1599). At present, the publishing practice all dictionaries, special vocabularies and encycklopaediae are accented. Accented letters are used in textbooks for schools, reference books, linguistic texts, and in publication of laws.
In common press (newspapers, fiction, etc.) only the letters of the main Lithuanian alphabet are used. Accented letters are used only in those words where it has a distinctive function.
2.2. 8-bit single-byte coding (National standard code
tables)
There are three national code tables in Lithuania for encoding extended alphabet (usually we say "for encoding accented letters"). The basic Lithuanian code table is for UNIX environment (the second half of this table is shown in fig. 1). It defines the basic character repertoire including accented letters. This code table is conformant with ISO/IEC 8859-13, i. e. the codes of all Lithuanian main letters in both tables are the same. Common use and very important graphic characters are retained. The repertoire of this table is optimal for linguistic text processing.
Code table for Windows OS contains the basic repertoire and extra phonetic symbols in 8 and 9 columns. This code table is conformant with 8859-13.
Code table for DOS contains basic repertoire and box drawing symbols and is conformant with IBM CP 775 for Baltic States. DOS environment is still popular in publishing houses.
Fig. 1. UNIX code table for Lithuanian accented letters (second half)
2.3. Multiple-Octet coding in ISO/IEC 10646-1 (UCS
codes)
All letters of main Lithuanian alphabet have UCS codes (codes in ISO/IEC 10646-1) or UNICODE codes. The situation with Lithuanian accented letters is more complicated. As it was mentioned, Lithuanian accented letters are Latin script letters with grave accent, acute accent or tilde. So some Lithuanian accented letters are also the common letters in other languages. For example, LATIN LETTER A WITH ACUTE is also in Irish, Icelandic, Portuguese, Slovak etc. languages, LATIN LETTER N WITH TILDE is also in Basque, Breton and Spanish languages. Thus they have separate UCS codes.
All together there are 33 Lithuanian accented letters which have separate UCS codes and 35 accented letters have not separate UCS codes.
Not shadowed letters have UCS codes; shadowed letters have not UCS codes.
There is another problem with small letter "i" (and "i with ogonek"). Lithuanian letter "i" is with a dot above. All accented forms of "i" should be also with a dot (see samples in 2.4). In ISO/IEC 10646-1 all such forms are dotless. For example, LATIN SMALL LETTER I WITH ACUTE in fact specifies "Latin small letter dotless i with acute". We ought to retain a dot above, in that case, so we should define these letters as LATIN SMALL LETTER I WITH DOT ABOVE AND ACUTE (or may be LATIN SMALL LETTER DOTLESS I WITH DOT ABOVE AND ACUTE).
2.4. Samples
In [3, p.350]:
In [12, p.75]:
In [4, p.38]. Note the accented "i":
2.5. References
1. M. Dauksa, Kathechismas (1595) and
Postilla catholicka (1599).
2. Lietuvių kalbos žodynas, I–XVIII t. [Dictionary of Lithuanian Language, I–XVIII volumes], Vilnius, 1956–1997.
3. Dabartinės lietuvių kalbos žodynas, vyr. red. St. Keinys [Dictionary of Modern Lithuanian Language, ed. by St. Keinys], Vilnius, Mokslo ir enciklopedijų leidykla, 1993.
4. Adelė Laigonaitė, Zigmas Zinkevičius, Lietuvių kalba. Mokomoji knyga X klasei [Lithuanian Language. Textbook for X form], Kaunas, Sviesa, 1997.
5. S. Matulaitienė, Skaitiniai. Vadovėlis VI klasei [Lithuanian Texts. Textbook for VI form], Kaunas, Sviesa, 1990.
6. Lithuanian Grammar, ed. by V. Ambrazas, Vilnius, Baltos lankos, 1997.
7. T. Mathiassen, A Short Grammar of Lithuanian, Slavica Publishers, Columbus, Ohio, 1996.
8. M. Ramonienė, I. Press, Colloquial Lithuanian, London and New York, Routledge, 1996.
9. B. Svecevičius, B. Piesarskas, Lietuvių - anglų kalbų žodynas [English - Lithuanian Dictionary], Vilnius, Mokslas, 1979.
10. Vokiečių - lietuvių kalbų žodynas [German - Lithuanian Dictionary], Vilnius, Mokslas, 1989.
11. A. Parenti, Italiano - Lituano, Lituano - Italiano, Garzanti Editore, 1994.
12. Romos Misiolas. Gedulinis Misiolas [Missalis Romani. Missale Parvum], Kaunas - Vilnius, 1982.
13. Tarptautinių žodžių žodynas, ats. red. V. Kvietkauskas [Dictionary of International words, ed. by K. Kvietkauskas], Vilnius, Vyriausioji enciklopedijų redakcija, 1985.