Re: unicode on Linux

From: Stephane Bortzmeyer ([email protected])
Date: Tue Oct 21 2003 - 06:43:43 CST

Next message: Jonathan Coxhead: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Previous message: Jill Ramonsky: "RE: Backslash n [OT] was Line Separator and Paragraph Separator"
In reply to: Stefan Persson: "Re: unicode on Linux"
Next in thread: Edward H. Trager: "Re: unicode on Linux"
Reply: Edward H. Trager: "Re: unicode on Linux"
Reply: Peter Kirk: "Re: unicode on Linux"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Mon, Oct 20, 2003 at 10:14:22PM +0200,
Stefan Persson <[email protected]> wrote
a message of 23 lines which said:

> >Just wondering if anybody knowss how unicode is on Linux?
> >
> Very good support.

Very optimistic.

Kernel
*****

1) File names in Unicode: no (well, the Linux kernel is 8-bits clean
so you can always encode in UTF-8, but the kernel does not do any
normalization and the applications do not expect UTF-8, for instance
ls sorts alphabetically but dot not know Unicode sorting).

2) User names: worse since utilities to create an account refuses
UTF-8.

Applications
************

3) grep: no Unicode regexp

4) xterm (or similar virtual terminals): No BiDi support at all

5) shells: I'm not aware of any line-editing shell (zsh, tcsh)
that have Unicode character semantics (back-character should move one
character, not one byte)

6) databases: I'm not aware of a free DBMS which has support for
Unicode sorting (SQL's ORDER BY) or regexps (SQL's LIKE).

7) Serious word processing: LaTeX has only very minimum Unicode

Also, many applications (exmh, emacs) are ten times slower when
running in UTF-8 mode.

At the present time, using Unicode on Unix is an act of faith.

> Default charset for recent versions of some popular distributions.

Yes, RedHat changed the default charset to Unicode without thinking
that text files were no longer readable.

See:

http://www.cl.cam.ac.uk/~mgk25/unicode.html
ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html
http://melkor.dnp.fmph.uniba.sk/~garabik/debian-utf8/howto.html

Next message: Jonathan Coxhead: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Previous message: Jill Ramonsky: "RE: Backslash n [OT] was Line Separator and Paragraph Separator"
In reply to: Stefan Persson: "Re: unicode on Linux"
Next in thread: Edward H. Trager: "Re: unicode on Linux"
Reply: Edward H. Trager: "Re: unicode on Linux"
Reply: Peter Kirk: "Re: unicode on Linux"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST