From: Eric Muller (emuller@adobe.com)
Date: Thu Sep 06 2007 - 12:39:04 CDT
Stephane Bortzmeyer wrote:
> This leads to another issue in the database format, which I prefer to
> discuss here first: why are they ranges in UnicodeData.txt rather than
> explicit records for every character? Being explicit would avoid
> generating names for the implicit records (something which is not
> obvious and not well documented, IMHO).
>
> Or, a variant, why not a DervivedUnicodeData.txt file with the
> all the characters?
>
Just in case, we currently have a Public Review Issue for an alternate
representation of the UCD (PRI 109,
<http://www.unicode.org/review/pr-109.html>). The main goal of that
representation is to take advantage of the current hardware to require
as little intelligence as possible from the user of the representation
(remember that the typical machine at the time the UCD was designed was
characterized by MHz, kBytes of main memory and MBytes of disk). In
particular, a full representation of the UCD would/could have all the
character names explicitly listed, among may other things.
Personally, I think we should invest on that representation rather than
keep tweaking the current one.
Eric.
This archive was generated by hypermail 2.1.5 : Thu Sep 06 2007 - 12:43:48 CDT