Re: Compiling a list of Semitic transliteration characters

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Fri, 7 Sep 2012 03:32:13 +0100

On Tue, 28 Aug 2012 19:03:14 -0400
CE Whitehead <cewcathar_at_hotmail.com> wrote:

> For Romanization (conversion to Latin characters) of Arabic, see:
> http://en.wikipedia.org/wiki/Romanization_of_Arabic

Likewise, a reasonable list for Hebrew can be picked up from
http://en.wikipedia.org/wiki/Romanization_of_Hebrew .

Should probably include <p, U+0331 COMBINING MACRON BELOW> and <g,
U+0331> as variants of <p, U+0304 COMBINING MACRON> and <U+1E21 LATIN
SMALL LETTER G WITH MACRON>; I'm not sure if the former may be rendered
the same as the latter.

The Hebrew list omits a few characters I'm used to from old Hebrew
grammars:

Hatephs (and shewa):
U+1D43 MODIFIER LETTER SMALL A
U+1D49 MODIFIER LETTER SMALL E
U+1D52 MODIFIER LETTER SMALL O

'Pure' long vowels:
U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX
U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX
U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX
U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX

From other places I pick up the following groups - duplicates are mostly
pruned.

General Semitic
<U+1E6F LATIN SMALL LETTER T WITH LINE BELOW, U+0323 COMBINING DOT
BELOW>
U+026C LATIN SMALL LETTER L WITH BELT
U+00E4 LATIN SMALL LETTER A WITH DIAERESIS (Ethiopian)

Affricates may be written with superscript second element, so one
should include:
U+02E2 MODIFIER LETTER SMALL S
U+1DB4 MODIFIER LETTER SMALL ESH (TBC)
U+1DBE MODIFIER LETTER SMALL EZH
MODIFIER LETTER SMALL LETTER L WITH BELT (Missing! But I'm sure I've
seen it.)

Arabic transliteration - aleph and ain:
U+02BC MODIFIER LETTER APOSTROPHE
U+02BD MODIFIER LETTER REVERSED COMMA
U+02BE MODIFIER LETTER RIGHT HALF RING
U+02BF MODIFIER LETTER LEFT HALF RING
U+02C0 MODIFIER LETTER GLOTTAL STOP
U+02C1 MODIFIER LETTER REVERSED GLOTTAL STOP

More emphatics:
U+1E05 LATIN SMALL LETTER B WITH DOT BELOW
U+1E37 LATIN SMALL LETTER L WITH DOT BELOW
<p, U+0323 COMBINING DOT BELOW>

For Cairene Arabic, we could add, though I've only seen the IPA forms:
U+1E43 LATIN SMALL LETTER M WITH DOT BELOW (TBC)
U+1E5A LATIN CAPITAL LETTER R WITH DOT BELOW (TBC)

Emphatics transcribed as velarised:
U+1D6D LATIN SMALL LETTER D WITH MIDDLE TILDE
U+1D74 LATIN SMALL LETTER S WITH MIDDLE TILDE
U+1D75 LATIN SMALL LETTER T WITH MIDDLE TILDE
U+1D76 LATIN SMALL LETTER Z WITH MIDDLE TILDE

I therefore suspect, but do not recall seeing:
U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE
U+1D6C LATIN SMALL LETTER B WITH MIDDLE TILDE

Non-emphatic v. emphatic is a word feature rather than a segmental
feature in some Arabic dialects, so it may be as well to have all the
characters named as '... WITH MIDDLE TILDE'.

There are several other characters that phonetic descriptions will
need; see for example http://en.wikipedia.org/wiki/Arabic_phonology .

Emphatics can be treated as glottalised, so for transcriptions we may
also have:
U+02B9 MODIFIER LETTER PRIME
U+02C0 MODIFIER LETTER GLOTTAL STOP

For some of the fancier Hebrew transliterations we need:
U+0254 LATIN SMALL LETTER OPEN O
U+02B0 MODIFIER LETTER SMALL H
U+02B7 MODIFIER LETTER SMALL W (Also needed for Ethiopian Semitic)
U+02B8 MODIFIER LETTER SMALL Y
U+02B2 MODIFIER LETTER SMALL J (TBC)

For Akkadian we need:
U+00E0 LATIN SMALL LETTER A WITH GRAVE
U+00E1 LATIN SMALL LETTER A WITH ACUTE
U+00E8 LATIN SMALL LETTER E WITH GRAVE
U+00E9 LATIN SMALL LETTER E WITH ACUTE
U+00EC LATIN SMALL LETTER I WITH GRAVE
U+00EC LATIN SMALL LETTER I WITH ACUTE
U+00F9 LATIN SMALL LETTER U WITH GRAVE
U+00FA LATIN SMALL LETTER U WITH ACUTE
U+2080 SUBSCRIPT ZERO
U+2081 SUBSCRIPT ONE
U+2082 SUBSCRIPT TWO
U+2083 SUBSCRIPT THREE
U+2084 SUBSCRIPT FOUR
U+2085 SUBSCRIPT FIVE
U+2086 SUBSCRIPT SIX
U+2087 SUBSCRIPT SEVEN
U+2088 SUBSCRIPT EIGHT
U+2089 SUBSCRIPT NINE
U+00D7 MULTIPLICATION SIGN (Possibly just for Sumerian)

Cuneiform determinatives should arguably be transliterated using
mark-up. If you'd rather have them as plain text, I can pick out the
following list from the examples in 'the World's Writing Systems':

U+1D4F MODIFIER LETTER SMALL K
U+1D35 MODIFIER LETTER CAPITAL I
U+1D48 MODIFIER LETTER SMALL D
U+1DA0 MODIFIER LETTER SMALL F
U+1D4D MODIFIER LETTER SMALL G
U+1D50 MODIFIER LETTER SMALL M
U+02B3 MODIFIER LETTER SMALL R
U+1D58 MODIFIER LETTER SMALL U

U+2071 SUPERSCRIPT LATIN SMALL LETTER I
U+1D3F MODIFIER LETTER CAPITAL R
U+1D41 MODIFIER LETTER CAPITAL U

and one missing character 'MODIFIER LETTER SMALL S WITH CARON'. I
suppose one could substitute U+1DB4 MODIFIER LETTER SMALL ESH or use
<U+02E2 MODIFIER LETTER SMALL S, U+02B0>. It would be understood, but
it wouldn't look right.

I'm taking ASCII punctuation for granted.

In many cases, capital forms should also be added, depending on whether
the writing system transliterated or transcribed to is unicameral. IPA
and Akkadian transliteration are unicameral, but Akkadian transcription
need not be unicameral. Note that Akkadian transliteration adds
information not directly in the text - it is back-transliteration that
is lossy!

Richard.
Received on Thu Sep 06 2012 - 21:32:13 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 06 2012 - 21:35:33 CDT