Teletext mappings

From: Rob Hardy (rob@sneezes.freeserve.co.uk)
Date: Wed Jan 17 2001 - 20:02:33 EST


Hi everyone,

I'm preparing some mappings of teletext character sets to Unicode. You can
see my results so far at
http://www.sneezes.freeserve.co.uk/teletext/tech/charenc/teletextcharencs.ht
ml
[hope that URL doesn't get split..] This is a LARGE page, btw (150k). In
IE5+, hover over the character to get its name.

As you can see, I have some ambiguous characters and unknows, and am
wondering whether anyone would like to answer these questions :)

1) I'm not sure about the forms in G0_ARABIC. I've had some excellent help
from an Arabic-speaker, but am wondering whether it could be further
refined. I've uploaded the tables in the teletext spec to
http://www.sneezes.freeserve.co.uk/teletext/tech/charenc/teletextarabic.gif
so you can make a comparison. I haven't finished G2_ARABIC yet, so there's
a few gaps.

2) Hyphens or dashes - what's the difference?

3) Which to use: 2016: DOUBLE VERTICAL LINE, or 0x2225 PARALLEL TO, or
0x2251 BOX DRAWING DOUBLE VERTICAL, or 0x01C1 LATIN LETTER LATERAL CLICK ?

4) Turkish Lira - the teletext spec represents this with a combined ligature
'TL', which I can't find a Unicode character for. I've put in 20A4 LIRA
SIGN, but I don't think this is what the teletext designers had in mind. Is
this a case for a new Unicode character?

5) G0_LATIN_LETTISH_LITHUIAN looks to have a LATIN SMALL LETTER I WITH
CEDILLA, which I can't find in Unicode (so I've stuck in i with ogonek
instead). Is this missing?

6) Is there a 041F CYRILLIC CAPITAL LETTER PE with a curved top, like 0x22C2
N-ARY INTERSECTION, in both uppercase and lowercase forms? Perhaps this a
particular glyph of the PE character, represented as a separate entry in the
teletext table.

7) Misc. other characters: Couldn't decide between
    a) 2126: OHM SIGN or GREEK CAPITAL LETTER OMEGA, 03A9
    b) 0110: LATIN CAPITAL LETTER D WITH STROKE, or LATIN CAPITAL LETTER
ETH, 00D0
    c) 00DF: LATIN SMALL LETTER SHARP S, or GREEK SMALL LETTER BETA, 03B2
    d) 0251: LATIN SMALL LETTER ALPHA, or GREEK SMALL LETTER ALPHA, 03B1
    e) 00B0: DEGREE SIGN, or MASCULINE ORDINAL INDICATOR, 00BA

8) And some others I'm not sure of:
    a) Character 0x28 of G2_GREEK, looks like a colon
    b) Character 0x6e of G2_LATIN, looks like a tall Greek eta
    c) Character 0x7e of G2_LATIN, looks like an eta
    d) Character 0x52 of G0_GREEK, I've put it in as 0374 GREEK NUMERAL SIGN
but can't be sure

Perhaps there's some 7-bit sets knocking about which the teletext ones were
based on, which would help. The full teletext spec is available from
http://www.etsi.org, named ETSI 300 706 (you'll have to register to
download, but it's free). I suspect the designers of the spec would use a
single glyph to represent two characters in some cases, e.g. D with a stroke
would mean both 0110 and 00D0, seeing as both lowercase forms are further up
in the same set.

Hope I haven't asked too much in my first posting to this list :)

Regards,
Rob.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT