From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Fri Sep 28 2007 - 10:17:27 CDT
Hello Дмитры Турин,
you wrote:
> (2) My proposal not only economize mark-place in table of encoding
> (what is important itself),
Given the almost negligible quota of uppercase letters in the
number of assigned Unicode codepoints, the saving of code-
positions is quite unimportant in itself. Note that there are
only five† scripts featuring cases, at all, and all of them
are alfabets, hence comparably small. Have a look on the roadmaps,
sub <http://www.unicode.org/roadmaps/>, to develop a feeling
for the proportions.
> but also simplifies comparison of various variants of spelling
> (all letters are lower-case, first letter is upper-case, all
> letters are upper-case), because comparison is reduced to
> comparison in one variant of spelling (all letters are lower-case).
This is plainly wrong. For, e. g., a case-invariant comparison,
your proposition requires removal of your “marks”, whilst the
Unicode way requires case folding. Both are commensurably cheap
operations, on contemporary computers.
> Eternity (unlimited time) is before us !
> You [i. e. Philippe Verdy] are seggesting to carry
> gasket through future time !
Unicode is here to stay for quite a while. More than
16 years of development by uncountable contributors have been
invested in it, and it is deeply entrenched in an overwhelming
number of software products, and IT standards. So, if you want
to replace it with anything new, you would have to
- prove that your suggestion is indeed superior, and quite so
in order to justify the expenditure for the change-over,
- specify your suggestion thoroughly,
- solve, for your suggested encoding, all those problems that
have been solved for Unicode in those years (browse through
the Unicode Standard <http://www.unicode.org/versions/Unicode5.0.0/>,
the Character Databases <http://www.unicode.org/ucd/> and
<http://www.unicode.org/charts/unihan.html> and the Technical
Reports and Standards <http://www.unicode.org/reports/index.html>
to get an impression of the sheer amount of this work),
- demonstrate, how the cost of adapting all existing text-
processing software to your scheme can be afforded by the
vendors. Note that any new encoding scheme will not render
the existing software less complex; rather, the software will
become more complex, as it will have to cope both with legacy,
and new, data. Hence, there will be no savings (in terms of
reduced maintenace costs) that could compensate for the
development of the new code,
- and, above all, you would have to convince everybody that
the effort would be worthwile and they should join your plan.
Believe me, computer users are quite a conservative lot:
they want their data to be readable, editable, and processable,
for decades, if not for centuries.
Above all, your proposition will not work, at all, as the
details of case-mapping vary with the language.
> Give me _concrete_ examples of word/phrase,
> which you don't know how to write within my proposal,
> and i will send you _concrete_ answers.
Take, as an example, «İzmir» (in Turkish spelling), or «Izmir»
(in German spelling), respectively, the name of a Turkish town.
In your scheme, both of these spellings would be «¿izmir»,
where «¿» is your capitalize-initial mark. So, how could you
ever hope to render that word according to the user’s
expectations? Please do not point to higher-level protocolls,
such as language-tagging, because this discussion pertains
to encoding plain text.
You have not understood Philippes remark:
> NO capital at the first letter (for example with prefixes)
Example: the Netherlands’ capital «’s-Gravenhage». However,
you have already given examples of this sort of happening,
so there is no need to answer on this particular example.
You have also written:
> "Widespread error is equating of designation of a letters (_coding_) and
> their graphic images (_font_). It’s absolutely different things".
That error is definitely not widespread among the addressees of your remark;
rather, they are used to the notions of “character” vs. “glyph”.
However, most of them will agree that a capital A, a small a, a capital Αλφα,
a small αλφα, a capital Аз, and a small аз are six different letters.
But this has nothing to do with the encoding of those letters.
It was a deliberate decision, based on a history of about 30 years of
character encoding (before Unicode, as we know it), to assign six different
code position to those six characters, and not three or even only one.
Philippe had written:
> there are many other reasons why your solution is even more complicate
to which you have answered:
> List it, please. In way #1 ... , #2 ... , #3 ..., etc
Some of them have have been brought up in this discussion;
among them a new one in this very contribution (cf. «İzmir», above).
But, as I have explained above, it is rather your duty to demonstrate
the feasability of your proposition — and demonstrate it convincingly —
than the duty of this list’s subscribers to point out every single flaw
in your proposition.
Best wishes,
Otto Stolz
------------
† Armenian, Cyrillic, (Georgian), Greek, Latin; where Georgian
has not a fully developped case system,
cf. <http://www.unicode.org/versions/Unicode5.0.0/ch07.pdf>.
This archive was generated by hypermail 2.1.5 : Fri Sep 28 2007 - 10:22:47 CDT