From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Wed Dec 08 2004 - 07:33:30 CST
"D. Starner" <shalesller@writeme.com> writes:
> You could hide combining characters, which would be extremely useful if
> we were just using Latin and Cyrillic scripts.
It would need a separate API for examining the contents of a combining
character. You can't avoid the sequence of code points completely.
It would yield to surprising semantics: for example if you concatenate
a string with N+1 possible positions of an iterator with a string with
M+1 positions, you don't necessarily get a string with N+M+1 positions
because there can be combining characters at the border.
It's simpler to overlay various grouping styles on top of a sequence
of code points than to start with automatically combined combining
characters and process inwards and outwards from there (sometimes
looking inside characters, sometimes grouping them even more).
It would impose complexity in cases where it's not needed. Most of the
time you don't care which code points are combining and which are not,
for example when you compose a text file from many pieces (constants
and parts filled by users) or when parsing (if a string is specified
as ending with a double quote, then programs will in general treat a
double quote followed by a combining character as an end marker).
I believe code points are the appropriate general-purpose unit of
string processing.
-- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
This archive was generated by hypermail 2.1.5 : Wed Dec 08 2004 - 07:42:15 CST