Re: ASCII control codes in sequences of multibyte character sets

From: Steffen <sdaoden_at_gmail.com>
Date: Sat, 31 Aug 2013 16:36:35 +0200

Thank you all very much for your kind answers!
My goodness, i should have referenced the thread on the POSIX
mailing list myself, yet i guess it discerns the expert that he
knows about evil character sets without such hints…

Reading your messages it seems safe to request a clarification of
a POSIX wording (Base Definitions, 6.2 Character Encoding; [1]),
from

  Likewise, the byte values used to encode <period> and <slash>
  shall not occur as part of any other character in any locale.

to

  Likewise, the byte values used to encode <period>, <slash>,
  <newline> and <carriage-return> shall not occur as part of any
  other character in any locale.

  [1] <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06>

Of course the ISO C and POSIX facilities are insufficient to deal
with text, portably. (But this theoretical change would turn many
decade-old POSIX programs which test characters against '\n' and
'\r' into functioning software again. By definition, that is.)

P.S.: Wow! I now have an email account nearby the wild Rocky
Mountains! I reckon that's a good place for living. Yay!

--steffen

attached mail follows:


Hello character plus experts,
i'm wondering wether there are any multibyte character sets known
which use the numerical values of ASCII control characters that
are vital to Unix/POSIX (plus) as part of multibyte sequences?
In particular U+000A and U+000D?
Thank you very much in advance (and don't forget to have a nice
weekend, will ya?)

--steffen
Received on Sat Aug 31 2013 - 09:39:56 CDT

This archive was generated by hypermail 2.2.0 : Sat Aug 31 2013 - 09:39:59 CDT