From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Wed Oct 09 2002 - 03:55:30 EDT
Marco Cimarosti wrote:
>
> John Aurelio Cowan wrote:)
> > Marco Cimarosti scripsit:
> > > Talking about the format of mapping tables, I always
> > > wondered why not using ranges. In the case of ISO
> > > 8859-11, the table would become as compact as
> > > three lines:
> >
> > Well, that wins for 8859-1 and 8859-11 and ISCII-88, where Unicode
> > copied existing layouts precisely. But it wouldn't help other 8859-x
> > much if at all,
>
> All 8859 tables would be more succint.
>
> Non-Latin sections use contiguous ranges of letters in alphabetical order
> or, however, in the same order used by Unicode; this is also true for most
> other non-ISO charsets.
>
> Latin sections are a worse case, but they still benefit slightly, because
> characters shared with Latin-in stay the same positions.
>
> > and it requires binary search rather than direct
> > array access, which would be a terrible lossage in CJK, where the
> > real costs are.
>
> I agree. In the case of CJK it simply doesn't pay.
If I may add my two cents; IMO using search algorithms to reduce table size
doesn't pay in any case. If one uses fast one/two-stage lookup tables for
both mappings (legacy to unicode and v.v.) then most tables require about
3 kb or less of storage space. Approx. times 10..30 for CJK encodings.
Compared to the 256 Mb in a typical PC each lookup table would consume 0.001%
(or 0.01-0.03% for CJK) of main memory. My point is it is better to concentrate
on processing speed than on table foot print.
Theo
This archive was generated by hypermail 2.1.5 : Wed Oct 09 2002 - 04:43:24 EDT