Re: Getting A Newb Started

From: William J Poser (wjposer@ldc.upenn.edu)
Date: Mon Jul 07 2008 - 16:19:15 CDT

Next message: Ngwe Tun: "Re: wikipedia unicode font."

Previous message: Kenneth Whistler: "Re: Normalisation and directionality (was: how to add all latin (and greek) subscripts)"
In reply to: J: "Re: Getting A Newb Started"
Next in thread: David Starner: "Re: Getting A Newb Started"
Reply: David Starner: "Re: Getting A Newb Started"
Reply: John H. Jenkins: "Re: Getting A Newb Started"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

There's no way to avoid using more than one byte per character if
you're using Unicode since there are more than 256 characters. If
you use UTF-32, every char is four bytes. If you use UTF-8, characters
take from one to four bytes depending on where the corresponding codepoint
is. If you use UTF-16, every character in the BMP is two bytes, any character
outside of the BMP takes four bytes.

The downside of UTF-16 and UTF-8 is that characters are not the same
length, which makes processing more complicated. With UTF-16, however,
if you know that there are no characters outside the BMP, every
character is a constant two bytes wide.

Bill

Next message: Ngwe Tun: "Re: wikipedia unicode font."
Previous message: Kenneth Whistler: "Re: Normalisation and directionality (was: how to add all latin (and greek) subscripts)"
In reply to: J: "Re: Getting A Newb Started"
Next in thread: David Starner: "Re: Getting A Newb Started"
Reply: David Starner: "Re: Getting A Newb Started"
Reply: John H. Jenkins: "Re: Getting A Newb Started"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 16:21:23 CDT