From: Martin J. Dürst (duerst@it.aoyama.ac.jp)
Date: Mon May 23 2011 - 02:40:06 CDT
On 2011/05/19 19:35, Christoph Päper wrote:
> I believe it would help if input immediately was transformed to and text was saved in NFD, because this would make the need for uniform treatment more obvious.
It might help in theory, but in practice, NFC is much, much closer to
what's out there in the real world (in particular the Web). So please
use NFD for internal processing if you think that helps you, but please
use NFC for all cases where it may be seen by other programs.
> It would be cool if there was an ASCII-compatible encoding with variable length like UTF-8 that supported only NFD (or NFKD) and was optimized for a small storage footprint, e.g. from U+00C0–017F only a handful would have to be coded separately. Sadly, though, it is unrealistic to have a unique single byte code for each combining diacritic, because there are so many of them: even just ranges U+0300–036F and U+1DC0–1DFF are 176 positions together, although some are still unassigned; that is more than you can encode with 7 bits or less.
We don't need any more character encodings. Unicode is about reducing
them, not about inventing more. The storage savings are way less
important with current hardware than the reduction of confusion with
fewer encodings.
Regards, Martin.
This archive was generated by hypermail 2.1.5 : Mon May 23 2011 - 02:46:39 CDT