From: Dean Snyder (dean.snyder@jhu.edu)
Date: Wed May 18 2005 - 23:15:33 CDT
Alexander Kh. wrote at 7:24 PM on Wednesday, May 18, 2005:
>That's Microsoft scale gigantism. I can think of many ways to restrict
>use of Unicode to only non-critical cases where the accuracy of data is
>of no importance. For example: by using a modified UTF-8 format where
>a ASCII letter can be used as a switch selector between any local
>encodings - that method will allow to save A LOT of space for commonly
>used characters.
>
>I think that by biulding extentions to UTF-8, such as a state-machine
>system, and using small but well-thought encoding tables and fonts one
>can totally avoid using Unicode, which is sloppy, inaccurate, incomplete
>and for some strange reason uses character '\0' within a string. This is
>not to mention its endianness problem. ...
Stateful mechanisms for plain text encoding are bad if for no other
reason than fragment fragility. Unfortunately Unicode does contain some
state-machine characters, which I think are mistakes - enabling, as they
do, fragment ambiguity or non-interpretability.
Here are some:
Stateful mechanisms that contribute to fragility at the character level -
Surrogates
BOM
Stateful mechanisms that contribute to fragility above the character level -
Bidirectional Ordering Controls
Annotation characters
Are there other stateful mechanisms in Unicode?
Respectfully,
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi/
http://users.adelphia.net/~deansnyder/
This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 10:12:52 CDT