Fwd: Unihan SQL access

From: Uriah Eisenstein ([email protected])
Date: Sat Oct 16 2010 - 09:07:42 CDT

Next message: Stephane Bortzmeyer: "Re: Derived age regexp"

Previous message: William_J_G Overington: "Re: OpenType update for Unicode 5.2/6.0?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Well, I've added support for the remaining few fields, and while at it
upgraded to Unihan 6.0.0 which is just out, and made quite a few other
improvements.
The only remaining piece of data not handled is a single line in each of
kFenn and kHKGlyph, including two entries instead of one, so I wasn't sure
whether this is intentional or not. Nice to see some of the questionable
entries (1- or 2-character kDefinition values, so far) have been fixed
already in the new Unihan version :)
Regards,
Uriah

---------- Forwarded message ----------
From: Uriah Eisenstein <[email protected]>
Date: Thu, Sep 30, 2010 at 8:48 PM
Subject: Fwd: Unihan SQL access
To: unicode List <[email protected]>

As usual this took longer than I thought... But an initial version is
finally ready, and can be found in
http://babelfish.50webs.com/unihan-sql-browser/Unihan%20SQL%20Browser.html.
It requires access to the Unihan.zip file and a JDBC driver; there are
explanations on the web page which I hope would be enough. Quite a few
improvements are already planned... I'd be glad to hear anyone finds it
useful.

While at it, I found a couple of apparent typos in the source indications
of variants (using SELECT DISTINCT SOURCE FROM VARIANT_SOURCE). These all
come from the kSemanticVariant field:

SELECT * FROM kSemanticVariant_source
WHERE kSemanticVariant_source IN ('kMathews', 'kMeterWempe')

[U+3C92] 勽 [U+52FD] kMathews
勽 [U+52FD] [U+3C92] kMathews
[U+25500] 渹 [U+6E39] kMeterWempe

Regards,
Uriah Eisenstein

---------- Forwarded message ----------
From: Uriah Eisenstein <[email protected]>
Date: Sun, Sep 12, 2010 at 5:57 PM
Subject: Unihan SQL access
To: unicode List <[email protected]>

Hello,
I'm nearing completion of a simple Java program which loads Unihan data from
the source files into a DB, and provides SQL access to it.There's still at
least a week or so of work on issues I consider essential, but once ready
I'd be happy to make it available on the Internet if anyone's interested.
So far I've used it to search for possibly erroneous data in Unihan; my
latest find is that 73 characters have a kTaiwanTelegraph value of 0000,
which seems doubtful. It can also be useful for various statistical
information such as how many characters are listed under each radical, or
which blocks include IICore characters.
I'm also considering adding the contents of the Unicode Character Database
as well at a later phase.
Regards,
Uriah Eisenstein

Next message: Stephane Bortzmeyer: "Re: Derived age regexp"
Previous message: William_J_G Overington: "Re: OpenType update for Unicode 5.2/6.0?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Oct 16 2010 - 09:13:44 CDT