XTF-Morse (was RE: UTF-Morse)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Fri Nov 22 2002 - 07:22:37 EST

  • Next message: Marco Cimarosti: "RE: UTF-Morse"

    Doug Ewell wrote:
    > Yes, it's true. Marco had sent me his UTF-Morse proposal just
    > yesterday, along with a suggestion that I put together an
    > implementation for April Fool's Day. And darned if I wasn't
    > really going to do it. As a JOKE.
    >
    > But Marco, you need to check your invented sequences again.
    > The leading and trailing Morse code units for the
    > (non-ASCII) multi-Morse characters conflict with some of the
    > single-unit characters. For example, U+002D -....- looks like
    > a leading unit, and U+0023 .-.-.. looks like a trailing unit.

    --- --- --- --- --- ..--.. ...

    Sorry! Not only I use everybody's bandwidth for April fools in advance: I
    also get all the details wrong!

    I attempted to simplify the wording while translating in English, and I
    messed everything up. So now I have to use more bandwidth to send a
    corrected version.

    > (It's only a JOKE, guys. Take a breath.)

    BTW I recalled that, time ago, the aficionados of faction UTF's on this list
    decided to call their creations "XTF's", in order to minimize the
    possibility of confusion with real UTF's.

    So, everybody reading this message now or in the next years, please take
    notice that XTF-Morse is *not* an UTF: just an aborted April fool! So please
    don't knock at the Unicode Consortium asking for the last version of the
    specs for sending Unicode in Morse!

    _ Marco

    ======================================================================
    XTF-Morse [*] - "Bringing Unicode in the telegraph age!"

    ----------------------------------------------------------------------
    0. Terminology

    In this document, the following special terms are used:

    - "Morse Dot": a short Morse signal; represented with "." in this
      document.

    - "Morse Dash": a long Morse signal; represented with "-" in this
      document.

    - "Morse Symbol": a sequence of one or more Dots, constituting a
      Morse character such as a letter or a punctuation mark.

    - "Morse Pause": a short pause which separates adjacent
      Morse symbols; represented with " " (a space) in this document.

    - "Morse Space": a long pause which separates words; represented
      with "/" in this document.

    - "Morse Oct": a special Morse Symbol representing three bits of
      an Unicode code point.

    ----------------------------------------------------------------------
    1. Encoding characters in the "ASCII printable" range.

    Each Unicode characters in range U+0020..U+007E is encoded as a Morse
    Space, as a single Morse Symbols, or as a sequence of two Morse
    Symbols, as specified in the following table:

    Code: XTF-Morse: Character name:
    ------ ----------- --------------------------
    U+0020 / SPACE (Morse Space)
    U+0021 -----. EXCLAMATION MARK [1]
    U+0022 .-..-. QUOTATION MARK
    U+0023 .-.-.. NUMBER SIGN [1]
    U+0024 ..-... DOLLAR SIGN [1]
    U+0025 ..-..- PERCENT SIGN [1]
    U+0026 ..-.-. AMPERSAND [1]
    U+0027 .----. APOSTROPHE
    U+0028 -.--.- LEFT PARENTHESIS
    U+0029 -.---. RIGHT PARENTHESIS [1]
    U+002A -.---- ASTERISK [1]
    U+002B --.... PLUS SIGN [1]
    U+002C --..-- COMMA
    U+002D -....- HYPHEN-MINUS
    U+002E .-.-.- FULL STOP
    U+002F -..-. SOLIDUS [1]
    U+0030 ----- DIGIT ZERO
    U+0031 .---- DIGIT ONE
    U+0032 ..--- DIGIT TWO
    U+0033 ...-- DIGIT THREE
    U+0034 ....- DIGIT FOUR
    U+0035 ..... DIGIT FIVE
    U+0036 -.... DIGIT SIX
    U+0037 --... DIGIT SEVEN
    U+0038 ---.. DIGIT EIGHT
    U+0039 ----. DIGIT NINE
    U+003A ---... COLON
    U+003B ---..- SEMICOLON [1]
    U+003C ---.-. LESS-THAN SIGN [1]
    U+003D ----.. EQUALS SIGN [1]
    U+003E ---.-- GREATER-THAN SIGN [1]
    U+003F ..--.. QUESTION MARK
    U+0040 -.-.-. COMMERCIAL AT [1]
    U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
    U+0042 ..-- -... LATIN CAPITAL LETTER B [2]
    U+0043 ..-- -.-. LATIN CAPITAL LETTER C [2]
    U+0044 ..-- -.. LATIN CAPITAL LETTER D [2]
    U+0045 ..-- . LATIN CAPITAL LETTER E [2]
    U+0046 ..-- ..-. LATIN CAPITAL LETTER F [2]
    U+0047 ..-- --. LATIN CAPITAL LETTER G [2]
    U+0048 ..-- .... LATIN CAPITAL LETTER H [2]
    U+0049 ..-- .. LATIN CAPITAL LETTER I [2]
    U+004A ..-- .--- LATIN CAPITAL LETTER J [2]
    U+004B ..-- -.- LATIN CAPITAL LETTER K [2]
    U+004C ..-- .-.. LATIN CAPITAL LETTER L [2]
    U+004D ..-- -- LATIN CAPITAL LETTER M [2]
    U+004E ..-- -. LATIN CAPITAL LETTER N [2]
    U+004F ..-- --- LATIN CAPITAL LETTER O [2]
    U+0050 ..-- .--. LATIN CAPITAL LETTER P [2]
    U+0051 ..-- --.- LATIN CAPITAL LETTER Q [2]
    U+0052 ..-- .-. LATIN CAPITAL LETTER R [2]
    U+0053 ..-- ... LATIN CAPITAL LETTER S [2]
    U+0054 ..-- - LATIN CAPITAL LETTER T [2]
    U+0055 ..-- ..- LATIN CAPITAL LETTER U [2]
    U+0056 ..-- ...- LATIN CAPITAL LETTER V [2]
    U+0057 ..-- .-- LATIN CAPITAL LETTER W [2]
    U+0058 ..-- -..- LATIN CAPITAL LETTER X [2]
    U+0059 ..-- -.-- LATIN CAPITAL LETTER Y [2]
    U+005A ..-- --.. LATIN CAPITAL LETTER Z [2]
    U+005B ..---. LEFT SQUARE BRACKET [1]
    U+005C .-.... REVERSE SOLIDUS [1]
    U+005D ..---- RIGHT SQUARE BRACKET [1]
    U+005E .-...- CIRCUMFLEX ACCENT [1]
    U+005F ------ LOW LINE [1]
    U+0060 ...--- GRAVE ACCENT [1]
    U+0061 .- LATIN SMALL LETTER A
    U+0062 -... LATIN SMALL LETTER B
    U+0063 -.-. LATIN SMALL LETTER C
    U+0064 -.. LATIN SMALL LETTER D
    U+0065 . LATIN SMALL LETTER E
    U+0066 ..-. LATIN SMALL LETTER F
    U+0067 --. LATIN SMALL LETTER G
    U+0068 .... LATIN SMALL LETTER H
    U+0069 .. LATIN SMALL LETTER I
    U+006A .--- LATIN SMALL LETTER J
    U+006B -.- LATIN SMALL LETTER K
    U+006C .-.. LATIN SMALL LETTER L
    U+006D -- LATIN SMALL LETTER M
    U+006E -. LATIN SMALL LETTER N
    U+006F --- LATIN SMALL LETTER O
    U+0070 .--. LATIN SMALL LETTER P
    U+0071 --.- LATIN SMALL LETTER Q
    U+0072 .-. LATIN SMALL LETTER R
    U+0073 ... LATIN SMALL LETTER S
    U+0074 - LATIN SMALL LETTER T
    U+0075 ..- LATIN SMALL LETTER U
    U+0076 ...- LATIN SMALL LETTER V
    U+0077 .-- LATIN SMALL LETTER W
    U+0078 -..- LATIN SMALL LETTER X
    U+0079 -.-- LATIN SMALL LETTER Y
    U+007A --.. LATIN SMALL LETTER Z
    U+007B --.-.. LEFT CURLY BRACKET [1]
    U+007C --.--. VERTICAL LINE [1]
    U+007D --.-.- RIGHT CURLY BRACKET [1]
    U+007E --.--- TILDE [1]

    ----------------------------------------------------------------------
    2. Encoding other Unicode characters
     
    All other Unicode characters are encoded with sequences of 1 to 7
    Morse Symbols called "Morse Octs".

    Each Morse Oct represents three bits in the Unicode code value; in
    other terms, Morse Octs are Morse-encoded octal digits.

    There are two sets of Morse Octs: Morse Octs T0..T7 represent the last
    octal digit in a sequence, whereas Morse Octs L0..L7 represent the
    other octal digits.

    Octal Digit: Morse Oct:
    ------------ ----------
    L0 .-.--.
    L1 .-.---
    L2 .--...
    L3 .--..-
    L4 .--.-.
    L5 .--.--
    L6 .---..
    L7 .---.-

    Octal Digit: Morse Oct:
    ------------ ----------
    T0 -...-.
    T1 -...--
    T2 -..-..
    T3 -..-.-
    T4 -..--.
    T5 -..---
    T6 -.-...
    T7 -.-..-

    The encoding of an Unicode code point proceeds with these steps:

    - The Unicode code point is converted to an octal number.

    - Leading zeros are stripped, if present.

    - For each resulting octal digit apart the last one, the corresponding
      Morse Oct L0..L7 is emitted.
       
    - The Morse Oct T0..T7 corresponding to the last octal digit is emitted.

    The following table summarizes the number and kind of Octs generated
    for each Unicode code point:

    Code range: Octal: Generated Morse Octs:
    ----------------- ------- ---------------------
    U+0000..U+0007 000000z Tz
    U+0008..U+001F 00000yz Ly Tz
    U+007F..U+01FF 0000xyz Lx Ly Tz
    U+0200..U+0FFF 000wxyz Lw Lx Ly Tz
    U+1000..U+7FFF 00vwxyz Lv Lw Lx Ly Tz
    U+8000..U+3FFFF 0uvwxyz Lu Lv Lw Lx Ly Tz
    U+40000..U+10FFFF tuvwxyz Lt Lu Lv Lw Lx Ly Tz

    ----------------------------------------------------------------------
    3. Notes

    [1]: Some Morse Symbol are unique to XTF-Morse, and are unknown in
         traditional Morse.

    [2]: Capital letters use the same Morse Symbol as small letter,
         preceded by Morse code "..--" (which is unique to XTF-Morse).

    [*]: In a previous version of this document, XTF-Morse was called
         "UTF-Morse". The name has been changed in order to emphasize
         that this is not a real UTF (Unicode Transformation Format),
         but just a parody of an UTF. Yes, a parody, a joke! Sorry, did
         you really read all of it seriously up to this point? :-)

    ======================================================================



    This archive was generated by hypermail 2.1.5 : Fri Nov 22 2002 - 08:10:05 EST