On Wed, Jun 23, 1999 at 12:36:48PM -0700, Markus Kuhn wrote:
> Keld J|rn Simonsen wrote on 1999-06-23 17:40 UTC:
> > On Wed, Jun 23, 1999 at 07:37:15AM -0700, Markus Kuhn wrote:
> > > Is there any work going on to review the POSIX.1 and POSIX.2 standards
> > > systematically to add proper UTF-8 support? For instance,
> > > the terminal driver can be set into a "cooked" mode where a
> > > single-line editing mechanism is applied before sending a line to an
> > > application, and the implementation of the erase function there has to
> > > know how many bytes to remove when a character is erased, which makes a
> > > difference between UTF-8 and ISO 8859-1 for instance. There should be a
> > > standard way to tell the terminal that it is in UTF-8 mode and has to
> > > perform character erase actions accordingly.
> > 
> > Hmm, why should UTF-8 support differ here from say EUC support?
> > The support should be there already.
> 
> I see neither EUC nor UTF-8 support in any POSIX document for system
> calls such as tcsetattr() that would allow me to tell the terminal in
> c_lflag|ICANON mode how many bytes to remove when it receives an ERASE
> character. I don't care much about EUC support, because this is not an
> ISO standard, but UTF-8 is one and should be fully and consistently
> supported here IMHO.
I am not aware of specific support for this in the POSIX standards.
> Vendors are setting up proprietary and non-portable solutions to work
> around such deficiencies in the POSIX standard regarding UTF-8. For
> example (quoting from an email from Tomas Vanhala
> <vanhala@ling.helsinki.fi>):
> 
>    I am curious of this, because at least on Solaris 7, it is also
>    possible to utilize the UTF-8 locale support built into the OS.
> 
>    If you go to http://docs.sun.com/, choose the "Solaris 7 Software
>    Developer Collection" and then the "Solaris Internationalization Guide
>    For Developers", you will find that the document contains a section
>    titled "Overview of en_US.UTF-8 Locale Support". The paragraph
>    "TTY Environment Setup" of the subsection "System Environment"
>    explains some UTF-8 specific STREAMS modules, e.g.
> 
>    /usr/kernel/strmod/eucu8	UTF-8 STREAMS module for tail side
>    /usr/kernel/strmod/u8euc	UTF-8 STREAMS module for head side
> 
>    Further down on the page, it is stated that:
> 
>    The dtterm(1) and any terminal that supports input and output of the
>    UTF-8 codeset should have the following STREAMS configuration:
> 
>    head <-> ttcompat <-> u8euc <->  ldterm <-> eucu8 <-> pseudo-TTY
> 
>    This can be setup with strchg(1) user-level program, if the
>    appropriate kernel modules have been loaded.
> 
> Is this really specified by POSIX?
Not to my knowledge.
> The Linux version of stty and the tty driver in the kernel is currently
> being extended to accommodate for UTF-8. Unfortunatelly, POSIX.1:1996
> does not give us any guidance of how to do this in a portable way. (See
> <ftp://ftp.ilog.fr/pub/Users/haible/utf8/> for the patches.)
> 
> > We have in WG20 enhanced the locale syntax to be able to cater for
> > ISO 10646 in the forthcoming ISO/IEC 14652 TR.
> 
> Very interesting! URL???
http://www.dkuug.dk/jtc1/sc22/wg20/ and then see under 14652.
> > UTF-8 does not need to be implemented as a charmap, it could be
> > implemented as something special.
> 
> If there is now really a new syntax defined to activate this "something
> special" in the locale definition files, than i am very happy to hear
> that and I am looking forward to see the details.
There is not such a new syntax for defining things like UTF-8.
> > > Anyone knowing on the current status of UTF-8 and POSIX?
> > 
> > I wrote a paper on 10646 support for WG15, which is now
> > included in the current draft of TR 14766. It base idea was using UTF-8
> > as a standard in all POSIX standards.
> 
> I know of
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/iso-tr-14766.txt
> 
> which I had to dig with Emacs artistic out of a proprietary word
> processing file format found on
> 
>   http://anubis.dkuug.dk/jtc1/sc22/wg15/iso14766/gnp3.wp
That paper was also available in .txt mode from the www.dkuug.dk site, url:
http://anubis.dkuug.dk/jtc1/sc22/wg15/iso14766/15
> Hm, but this contains not much that wasn't already obvious from the old
> USENIX Pike/Thompson Plan9/FSS-UTF paper in
> 
>   ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/UTF-8-Plan9-paper.ps.gz
> 
> Is there an updated version of your paper available that also covers new
> less obvious stuff such as non-charmap processing in locale
> specifications and tcsetattr() kernel terminal driver configuration for
> UTF-8?
No, the paper you have referred is that latest issue.
Keld
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT