Latin encodin model (was: Re: slashed letters)

From: Karl Pentzlin ([email protected])
Date: Mon Oct 27 2008 - 00:10:54 CST

Next message: AndrÃ© Szabolcs Szelp: "Re: Text scans needed containing slashed letters of 19/20th century Latvian and Sorbian orthography"

Previous message: Christopher Fynn: "Re: slashed letters"
In reply to: Christopher Fynn: "Re: slashed letters"
Next in thread: =?iso-8859-1?Q?António MARTINS-Tuválkin?=: "Re: Latin encodin model"
Reply: =?iso-8859-1?Q?António MARTINS-Tuválkin?=: "Re: Latin encodin model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Am Montag, 27. Oktober 2008 um 04:23 schrieb Christopher Fynn:

CF> Why not use:
CF> G + U+0338 COMBINING LONG SOLIDUS OVERLAY ...

Until now, Latin characters have been encoded as inseparable entities
not only for overstruck letters, but also for letters with any kind of
"fixed" appendages which are not attached simply at the bottom of a
letter (like ogonek or cedilla).
Regarding overstruck letters, the Sencoten additions (U+023A ...
U+023C etc.) or the more recent U+A75E/U+A75F are examples.
Especially the last one (overstruck V) was added for a specific
(mediaevist) purpose while not being used in a current orthography.
All these characters do not even have a compatibility equivalence to a
sequence containing U+0338.
Therefore, requiring so for other letters would be an inconsistency in
Unicode.

From an abstract view of point, it would have been possible to encode
such letters a priori as a sequence of basic letter + overstriking
diacritic, and maybe it had been the preferable way, as Unicode has
the mechanisms like many South Asian scripts show. This, however, had
required a U+0338 with explicitly declared semantics for doing so.

This is something like the Arabic encoding model, where a model based
on ghost characters + combining marks could have been selected but in
fact was not.

Even Latin characters with simple appendages are encoded as
indivisible entities without employing any compatibility equivalences.
Examples are the Uighur additions U+2C67 ... U+2C6C (historical use,
directly comparable to the slashed letters of my proposal) and the
letters with palatal and retroflex hook U+1D80 ... U+1D9A (pure
scientific use, not used in any orthography).

As said, such letters could have been constructed by combining
elements if the encoding model for Latin had been designed that way,
giving building elements explicitly devised for doing so, like this
has been done for most South Asian scripts.
Changing the Latin encoding model now would require, besides other
things, the introduction of a new equivalence (in addition to the
canonical equivalence which is stabilized now) to handle the existing
letters. Anyway, changing the Latin encoding model after the majority
of the Latin letters are encoding is not a recommendable task.

Using existing characters as "Lego blocks" to "build" arbitrarily
constructed letters, delegating the letter identities to specialized
fonts or rendering systems, cannot be the purpose of a standard like
Unicode.

- Karl Pentzlin

Next message: AndrÃ© Szabolcs Szelp: "Re: Text scans needed containing slashed letters of 19/20th century Latvian and Sorbian orthography"
Previous message: Christopher Fynn: "Re: slashed letters"
In reply to: Christopher Fynn: "Re: slashed letters"
Next in thread: =?iso-8859-1?Q?António MARTINS-Tuválkin?=: "Re: Latin encodin model"
Reply: =?iso-8859-1?Q?António MARTINS-Tuválkin?=: "Re: Latin encodin model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Oct 27 2008 - 00:14:13 CST