Re: A basic question on encoding Latin characters

From: Paul Keinanen (keinanen@sci.fi)
Date: Wed Sep 29 1999 - 15:27:11 EDT


On Wed, 29 Sep 1999 07:31:40 -0700 (PDT), "Mark E. Davis"
<markdavis@ispchannel.com> wrote:

>Remember, with dead keys the <combining-ring-above> is sent from the terminal *before*
>the <a>; this is different than the order that the characters are stored in memory.
>
>Mark

I have never seen a real terminal behave in that way (in fact I have
never seen a real Unicode terminal).

Looking at 7/8 bit character terminals, for instance the VT-220 has
two ways of generating characters not available from a particular
national keyboard version. One is using the compose character key,
e.g. <compose character> <a> <degree sign>. The other is using the two
key sequence e.g. <degree sign><a> in which the <degree sign> is a
dead key. The terminal does definitely _not_ send 0xB1 and 0x61, but
the dead key is stored internally waiting for the terminal character
and the 0xC5 is sent in the multinational mode (DEC-MCS which is very
close to 8859-1) or 0x5D <a-ring> if the Finnish or Swedish 7 bit
character set was selected. So, the terminal does definitively
generate a character internally from the keys pressed, before the
transmitted character is selected from current code page.

I would expect that a Unicode terminal would behave in the same way
before transmitting anything.
 
Of course, if a fifth composition form would be created, with
combining mark(s) before the base character, it would solve many
terminal related problems.

Paul
 
>
>Paul Keinanen wrote:
>
>> On Tue, 28 Sep 1999 21:06:39 -0700 (PDT), "Mark E. Davis"
>> <markdavis@ispchannel.com> wrote:
>>
>> >You misunderstand deadkeys: they reverse the order of typed combining marks. Let
>> >me spell it out.
>> >
>> >User types <combining-ring-above>. Host echos nothing
>> >User types <a>.
>>
>> The terminal sends <a> and <combining-ring-above>, which might be
>> received by the host in one packet or a long delay between them.
>>
>> >Host stores <a-ring>
>>
>> The host can only store this after a substantial delay or until next
>> base character is received.
>>
>> >and echos the appearance of <a-ring>.
>>
>> This can happen only after a long delay or after something else is
>> typed.
>>
>> >The host could also use form D, and store the sequence <a><combining-ring-above>,
>> >and also echo something with the right appearance (utilizing overlays).
>>
>> This in fact would be better, since the delay is not needed at the
>> host. The host would echo <a> and if <combining-ring-above> is
>> received, the a would be erased and replaced with <a-ring>.
>> Unfortunately, this does not work for hardcopy terminals.
>>
>> Paul
>>
>> >This is
>> >what would have to be done for complex scripts, or things like
>> ><g><combining-ring-above>, if they are supported by the host.
>> >
>> >Mark
>> >
>> >Robert Brady wrote:
>> >
>> >> On Tue, 28 Sep 1999, Murray Sargent wrote:
>> >>
>> >> > (The following may well have been mentioned earlier; I haven't followed the
>> >> > whole thread). If you enter combining mark sequences using deadkeys, there
>> >> > shouldn't be a problem. With deadkeys, nothing is displayed on the terminal
>> >> > until the base character is typed and nothing is sent to the pattern-match
>> >> > code. When the base character is typed, the corresponding fully composed or
>> >> > partially composed character sequence is sent to the terminal and to the
>> >> > pattern-match code. Deadkey input methods are usually part of the
>> >> > underlying OS, but apps can also implement them fairly easily.
>> >>
>> >> That doesn't work. Consider a telnet connection. At the end of one TCP
>> >> packet, the <a> is placed, but the <combining-ring-above> will not fit, so
>> >> it has to go in the next packet.
>> >>
>> >> Maybe the second packet is delayed for a few seconds, due to network
>> >> problems (why is not relevant).
>> >>
>> >> The app gets the <a> and then a few seconds later gets the
>> >> <combining-ring-above>.
>> >>
>> >> If you can see a way round this (other than abandoning the terminal
>> >> metaphor), the linux/utf-8 project would no doubt be happy to hear it. :)
>> >>
>> >> --
>> >> Robert



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT