From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Sep 26 2007 - 18:22:13 CDT
There are strong evidence that you have not even read any line of text of
the Ubicode standard. You are mixing everything in your message, every
concept, and attempting to completely remove all the abstractions already
performed and the intended goals.
Not only your various proposals to the list won't work as intended (you
forget MANY things), won't be accepted (you loose all interoperability
features), your encoding completely changes the way to handle text (by
making the text only parsable through contextual rules that are even
embedding concepts in unlimited levels and on unbound distances, making
things like safe extractions of substrings within text becoming very
ambiguous and nearly impossible to perform without having to parse the WHOLE
text where the substring is extracted, FROM THE BEGINING), and finally your
proposals are not even needed.
These notations you propose are ALREADY implemented using other opper-level
layers or standards, that DON'T break the compatibility with the lower layer
plain-text representation using Unicode.
Nothing in your proposal is needed. You are trying to redefine a new
rich-text format, something really different from the intended goal
supported in plain-text by Unicode, which concentrates on semantics of
*text-only* elements, not on their rendering.
None of your proposals will even work with existing font technologies. By
trying to mix every possible concept into a single merged layer, you create
a havoc that will soonbecome non interoperable and not manageable. The right
approach is to separate the problems, i.e. encode text only with Unicode,
and everything else in other upper-layer, out-of-band, standards, based for
example on XML (such as HTML, DocBook, MathML, ...) or other legacy formats
(RTF, MSDOC, Postscript...) where extra out-of-band semantics can be also
added on top of the represented text, as annotations, properties, grouping
behaviour, structuring elements...
So before you continue your proposals here to this list, please read the
standard, and notably the first chapters that explain the goals, formalize
the concepts, and discusses about conformance requirements, and what the
Ubicode standard is and IS NOT.
You also strongly need to read it, just to use the right terminology and
concepts. This will avoid you many errors like your usage of "byte" instead
of code point: the Unicode standard does not mandate a single binary
representation but represents characters by assigning them code points, that
have several binary representations independent of the architecture or
transport layer, and by assigning them a collection of properties to support
lots of text-handling algorithms : not every algorithm can be created with
these properties, as many of them will depend on context or application, or
on things that are NOT encoded in the text itself, but in other contexts
like the user locale (instead of the text writer's locale) and the other
upper-layer protocols (like XML based formats, or other networking and
file-format protocols) that embed Unicode text to map other properties on
top of it.
Don't forget that Unicode-encoded text formats can be used in other
applications than just text input forms and rendering on display or print.
Almost all you propose won't even have any meaning in all other contexts,
because they are NOT plain text, or they would have to be completely
ignored/discarded, making your proposed characters just an unneeded
pollution complicating the implementation of the many other upper-layer
protocols (many of them standardized too!) that are accepting to embed
Unicode-encoded plain texts:
For example what would be the meaning of a fraction of other mathematical
formula within the designation of a domain name? or in the designation of a
variable name or API name in a computer language? Really rethink about the
problem and consider the layered approach. Not every concept needs to be
formalized at the plain-text level.
If you want to transport text documents that include some advanced features,
use some other formats than just *.txt files: OpenDoc for example is based
on XML and offers such capabilities. If you want an exact rendering, use
PDF. If you want to publish your text for rendering on the web in browsers,
use HTML... Not every concept needs to map into a plain-text format (where
it is acceptable to have complications to represent things like fraction
bars, radicals, emphasis and italic/bold presentations... Base your choice
of format according to the use of the text intended by the author for some
specific purpose.
> -----Message d'origine-----
> De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
> part de Dmitry Turin
> Envoyé : mercredi 26 septembre 2007 07:59
> À : unicode@unicode.org
> Objet : Marks
>
> I repeated postings, because it was not come into my mailbox.
>
> Spending mark-place in coding table for capital letters - is
> inadmissible spending.
> Let all letters will be lower case: when there is a own name or beginning
> of sentence,
> one prefix-byte before a word is enough to specify, that first letter is
> upper-case.
> Let's name this prefix-byte as 'mark "own name"'. It works so: #anna ->
> Anna
> (where # is this prefix-byte).
> It's necessary to tell the same about abbreviations. One prefix-byte
> before a word
> is enough to specify, that all letters to symbol "blank" are upper-case.
> Let's name this byte as 'mark "abbreviation"'. It works so: #uno -> UNO.
> User himself puts prefix-bytes by pressing keys "Shift" and "Caps Lock".
> So comparison of various variants of spelling (all letters are lower-
> case,
> first letter is upper-case, all letters are upper-case) is reduced to
> comparison in one variant of spelling (all letters are lower-case) at
> search of similar word.
> Widespread error is equating of designation of a letters (__coding__)
> and their graphic images (__font__). It's absolutely different things.
>
>
> Pictures of prefix-bytes in in
> http://unicode2.chat.ru/site/unicode2/en/author/control_eng.htm
>
>
>
> Dmitry Turin
> Unicode2 (2.1.0) http://unicode2.chat.ru
> HTML6 (6.4.1) http://html60.chat.ru
> SQL4 (4.3.0) http://sql40.chat.ru
> Computer2 (2.0.3) http://computer20.chat.ru
>
>
>
This archive was generated by hypermail 2.1.5 : Wed Sep 26 2007 - 18:24:37 CDT