RE: Question: the german umlaut

From: Dominikus Scherkl (Dominikus.Scherkl@glueckkanja.com)
Date: Mon Nov 11 2002 - 10:40:16 EST

Next message: David J. Perry: "Entering Plane 1 characters in XP"

Previous message: Michael Everson: "Re: Plane 1 maths fraktur in textual apparatus?"
Maybe in reply to: Magda Danish (Unicode): "Question: the german umlaut"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> I just wanted to know how much space in bytes the Latin-1
> characters such as the german umlaut characters take up in
> UTF-8 encoding. Is it still just one byte or does it now
> require 2 bytes?
U+0000 up to U+007F take 1 byte (ASCII)
U+0080 up to U+07FF take 2 bytes (Latin-1, Latin extended,
combining diacritics, phonetics, greek, cyrillic, hebrew,
arabic, syriac, and some more scripts - this is very little
expansion especialy for laguages which use only few non-ASCII
characters like swedish or german but expensive for greek or
arabic or so)
U+0800 up to U+FFFD take 3 bytes (hangul, cjk... not to
expensive but significant)
U+10000 up to U+10FFFD take 4 bytes (this is all the rest -
take almoust everywhere 4 bytes, so this is no significant
expansion).

If space is a concern, use SCSU - this shorter and has the
additional advantage of beeing very much better compressable
by zip or comparable algorithms.

-- 
Dominikus Scherkl
dominikus.scherkl@glueckkanja.com

Next message: David J. Perry: "Entering Plane 1 characters in XP"
Previous message: Michael Everson: "Re: Plane 1 maths fraktur in textual apparatus?"
Maybe in reply to: Magda Danish (Unicode): "Question: the german umlaut"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 11 2002 - 11:14:26 EST