From: Doug Ewell (dewell@adelphia.net)
Date: Wed Apr 21 2004 - 11:24:22 EDT
Raymond Mercier <RaymondM at compuserve dot com> wrote:
> The problem of the size of Unihan has nothing at all to do with the
> cost of storage, and everything to do with the functioning of programs
> that might open and read it.
> Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D,
> this means that when opened in notepad the lines are not separated...
I have to agree that an ordinary plain-text editor is probably not the
right tool for browsing a 25-megabyte data file, even though I've been
known to do the same with UnicodeData.txt (which is admittedly an order
of magnitude smaller).
Even though Unihan is packaged as plain text, one record per
LF-terminated line (well, sort of), it's really more appropriate to
think of it as a data file, intended to be read by software. Something
like a batch file that calls grep (or other plain-text search tool)
would be more appropriate.
And as John said, converting LF to CRLF is quite a simple task -- it can
even be done by your FTP client, while downloading the file -- and
should not be thought of as a deficiency in the current plain-text
format.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Wed Apr 21 2004 - 12:17:09 EDT