UTF-Morse (was RE: Morse coded Unicode(was: Morse code))

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu Nov 21 2002 - 04:22:03 EST

  • Next message: Marco Cimarosti: "FW: Re: Errors in the Indic FAQ"

    Carl W. Brown wrote:
    > I think that the bigger issue might be how do you extend Morse code to
    > incorporate the Unicode character set.
    > [...]

    Carl, this is unfair!! You spoiled my April 1st joke in mid November!

    Ciao.
    Marco :-)

    ----------------------------------------------------------------------
    UTF-Morse - "Bringing Unicode in the telegraph age!"
        

    1. Unicode characters U+0020..U+007E are encoded according to the
    following table:

    Code: UTF-Morse: Character name:
    ------ ----------- --------------------------
    U+0020 / SPACE
    U+0021 -----. EXCLAMATION MARK [1]
    U+0022 .-..-. QUOTATION MARK
    U+0023 .-.-.. NUMBER SIGN [1]
    U+0024 ..-... DOLLAR SIGN [1]
    U+0025 ..-..- PERCENT SIGN [1]
    U+0026 ..-.-. AMPERSAND [1]
    U+0027 .----. APOSTROPHE
    U+0028 -.--.- LEFT PARENTHESIS
    U+0029 -.---. RIGHT PARENTHESIS [1]
    U+002A -.---- ASTERISK [1]
    U+002B --.... PLUS SIGN [1]
    U+002C --..-- COMMA
    U+002D -....- HYPHEN-MINUS
    U+002E .-.-.- FULL STOP
    U+002F -..-. SOLIDUS [1]
    U+0030 ----- DIGIT ZERO
    U+0031 .---- DIGIT ONE
    U+0032 ..--- DIGIT TWO
    U+0033 ...-- DIGIT THREE
    U+0034 ....- DIGIT FOUR
    U+0035 ..... DIGIT FIVE
    U+0036 -.... DIGIT SIX
    U+0037 --... DIGIT SEVEN
    U+0038 ---.. DIGIT EIGHT
    U+0039 ----. DIGIT NINE
    U+003A ---... COLON
    U+003B ---..- SEMICOLON [1]
    U+003C ---.-. LESS-THAN SIGN [1]
    U+003D ----.. EQUALS SIGN [1]
    U+003E ---.-- GREATER-THAN SIGN [1]
    U+003F ..--.. QUESTION MARK
    U+0040 -.-.-. COMMERCIAL AT [1]
    U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
    U+0042 ..-- -... LATIN CAPITAL LETTER B [2]
    U+0043 ..-- -.-. LATIN CAPITAL LETTER C [2]
    U+0044 ..-- -.. LATIN CAPITAL LETTER D [2]
    U+0045 ..-- . LATIN CAPITAL LETTER E [2]
    U+0046 ..-- ..-. LATIN CAPITAL LETTER F [2]
    U+0047 ..-- --. LATIN CAPITAL LETTER G [2]
    U+0048 ..-- .... LATIN CAPITAL LETTER H [2]
    U+0049 ..-- .. LATIN CAPITAL LETTER I [2]
    U+004A ..-- .--- LATIN CAPITAL LETTER J [2]
    U+004B ..-- -.- LATIN CAPITAL LETTER K [2]
    U+004C ..-- .-.. LATIN CAPITAL LETTER L [2]
    U+004D ..-- -- LATIN CAPITAL LETTER M [2]
    U+004E ..-- -. LATIN CAPITAL LETTER N [2]
    U+004F ..-- --- LATIN CAPITAL LETTER O [2]
    U+0050 ..-- .--. LATIN CAPITAL LETTER P [2]
    U+0051 ..-- --.- LATIN CAPITAL LETTER Q [2]
    U+0052 ..-- .-. LATIN CAPITAL LETTER R [2]
    U+0053 ..-- ... LATIN CAPITAL LETTER S [2]
    U+0054 ..-- - LATIN CAPITAL LETTER T [2]
    U+0055 ..-- ..- LATIN CAPITAL LETTER U [2]
    U+0056 ..-- ...- LATIN CAPITAL LETTER V [2]
    U+0057 ..-- .-- LATIN CAPITAL LETTER W [2]
    U+0058 ..-- -..- LATIN CAPITAL LETTER X [2]
    U+0059 ..-- -.-- LATIN CAPITAL LETTER Y [2]
    U+005A ..-- --.. LATIN CAPITAL LETTER Z [2]
    U+005B ..---. LEFT SQUARE BRACKET [1]
    U+005C .-.... REVERSE SOLIDUS [1]
    U+005D ..---- RIGHT SQUARE BRACKET [1]
    U+005E .-...- CIRCUMFLEX ACCENT [1]
    U+005F ------ LOW LINE [1]
    U+0060 ...--- GRAVE ACCENT [1]
    U+0061 .- LATIN SMALL LETTER A
    U+0062 -... LATIN SMALL LETTER B
    U+0063 -.-. LATIN SMALL LETTER C
    U+0064 -.. LATIN SMALL LETTER D
    U+0065 . LATIN SMALL LETTER E
    U+0066 ..-. LATIN SMALL LETTER F
    U+0067 --. LATIN SMALL LETTER G
    U+0068 .... LATIN SMALL LETTER H
    U+0069 .. LATIN SMALL LETTER I
    U+006A .--- LATIN SMALL LETTER J
    U+006B -.- LATIN SMALL LETTER K
    U+006C .-.. LATIN SMALL LETTER L
    U+006D -- LATIN SMALL LETTER M
    U+006E -. LATIN SMALL LETTER N
    U+006F --- LATIN SMALL LETTER O
    U+0070 .--. LATIN SMALL LETTER P
    U+0071 --.- LATIN SMALL LETTER Q
    U+0072 .-. LATIN SMALL LETTER R
    U+0073 ... LATIN SMALL LETTER S
    U+0074 - LATIN SMALL LETTER T
    U+0075 ..- LATIN SMALL LETTER U
    U+0076 ...- LATIN SMALL LETTER V
    U+0077 .-- LATIN SMALL LETTER W
    U+0078 -..- LATIN SMALL LETTER X
    U+0079 -.-- LATIN SMALL LETTER Y
    U+007A --.. LATIN SMALL LETTER Z
    U+007B --.-.. LEFT CURLY BRACKET [1]
    U+007C --.--. VERTICAL LINE [1]
    U+007D --.-.- RIGHT CURLY BRACKET [1]
    U+007E --.--- TILDE [1]

    2. All other Unicode characters are encoded with one of seven
    multi-Morse schemes:

    Code range: Scheme
    ----------------- ------
    U+0000..U+0007 1
    U+0008..U+001F 2
    U+007F..U+01FF 3
    U+0200..U+0FFF 4
    U+1000..U+7FFF 5
    U+8000..U+3FFFF 6
    U+40000..U+10FFFF 7

    Each scheme uses a Morse sequence of the form ".-.yyy", possibly
    preceded by one or more Morse sequences in the form ".-.yyy":

    Scheme Bits (x: 0 or 1): UTF-Morse (y: "." if x is 0, "-" if x is 1):
    ------ --------------------
    ------------------------------------------------
    1 00000000000000000xxx .-.yyy
    2 00000000000000xxxxxx -..yyy .-.yyy
    3 00000000000xxxxxxxxx -..yyy -..yyy .-.yyy
    4 00000000xxxxxxxxxxxx -..yyy -..yyy -..yyy .-.yyy
    5 000000xxxxxxxxxxxxxx -..yyy -..yyy -..yyy -..yyy .-.yyy
    6 000xxxxxxxxxxxxxxxxx -..yyy -..yyy -..yyy -..yyy -..yyy .-.yyy
    7 xxxxxxxxxxxxxxxxxxxx -..yyy -..yyy -..yyy -..yyy -..yyy -..yyy
    .-.yyy

    3. Notes

    [1]: Some sequences are unique to UTF-Morse, and are unknown in
         traditional Morse code.

    [2]: Capital letters use the same code as small letter, preceded by
         sequence "..--" (which is unique to UTF-Morse).

    ----------------------------------------------------------------------------
    -



    This archive was generated by hypermail 2.1.5 : Thu Nov 21 2002 - 05:14:37 EST