On 3/27/2013 12:07 PM, Philippe Verdy wrote:
> 2013/3/27 Asmus Freytag <asmusf_at_ix.netcom.com>:
>> At the moment, the statement that the existing encoding is actually
>> implementable is something that must be considered unproven (enough issues
>> have been pointed out for various elements of the unification already to
>> allow such a conclusion).
>>
>> What we are not getting closer to is a rational understanding of how to
>> improve this situation. "Random" addition of middle dot characters for some
>> purpose is just as bad as pretending everything is fine with the status quo.
> We are in fact not discussing "random" additions but want to handle
> correctly use cases that are in fact very frequently needed.
Ah, what additions are you discussing?
>
> For example The Catalan syllable breaker is not a "random" case, it is
> in fact highly used and needed as part of its standard orthography
> (and Catalan is not a minor language, we cannot just ignore it).
Are you suggesting the addition of a character for it?
>
> There are very frequent uses of the dots, and hyphens which are too
> much overloaded in their original ASCII-only encoding. same thing
> about apostrophes/quotes. This causes enough nightmares when trying to
> parse text, and it's unbelievable that there's no solution to augment
> the text with either distinct characters, or some varant selectors, or
> some other format controls to disambiguate these uses, which is really
> semantic on essential character properties (which are in Unicode since
> long, like the general category).
That's restating well-known issues. Thanks for agreeing. However, let's
limit the discussion to dots, otherwise we'll never get any conclusion.
For the dashes there are many explicit character that were encoded
already, same for the quotes. In those cases, there is often a more
readily discernible difference in appearance that made the decision to
disunify somewhat easier. The situation for middle dot is both less well
understood and less well addressed.
>
> The solution based on an upper-layer protocol will not work (for
> example in filenames, in databases of toponyms, or in archived legal
> documents whose interpretation should not cause any trouble, including
> when these documents are converted or exported to many other formats).
> We are here exactly within the definition of linguistic rules for each
> language, some of them being highly standardized and which would
> require a stricter, less ambiguous ebcoding. The time os ASCII only is
> over, The UCS offers many new unused possibilities, as well as many
> existing technical solutions, which should not be based just on an
> heuristic (which will ever break on many cases). Ysers want to get
> sure that their text will not be misinterpreted, or rendered in an
> ambiguous or wrong way.
Again, a nice general statement. However, it lacks the kind of detail
and documented evidence of particular usage that would bring us further
at this point.
>
> Even if the solutions proposed seem "novel" this should not block us.
> And even a "novel" solution can work in compatibility with the huge
> existing corpus of texts which will remain ambiguous as they are. The
> novel encoding solution can perfectly provide a fallback mechanism
> where it will adopt the old compatibility scheme (similar to ASCII).
>
> Of course, nothing will prevent anyone to use characters as they want
> in "random" cases, even if this breaks all commonly admitted
> properties and behaviors.
My use of the word "random" was directed at piecemeal addition of
characters. You are using it in a different sense.
> But this should be distinguished from
> frequently used cases which have rules formulated since long in
> wellknown languages (excepr that now the texts have to live in a
> environement which is more and more multilingual, for which it's not
> possible to just infer which lalnguage to select to apply its
> wellknown rules). We have no other solutions than providing explicit
> "hints" in the encoded texts (and to forget the time of ASCII-only,
> except in some technical domains like programming languages and
> transport/storage protocols which have their own internal syntaxes and
> which do not qualify relaly as "plain text").
You've advocated "hints" or "semantic selectors". While a feasible model
in principle, I see the main issue in that it would create "yet another"
type of encoding; this is especially troublesome in light of the
precedent for quotes and dashes, where there was a careful addition of
single-purpose (not overloaded) characters.
Unless you can present detailed analysis of the requirements which could
be used to prove that ONLY such novel coding construct can handle the
needed rendering and processing tasks I would fear it would be difficult
to get traction for such a proposal.
But that brings me back to my original issue: nobody has done the
necessary analysis of the requirements for all (or at least the major)
use cases for a mid-level to raised-level dot and pinned down what is or
isn't possible in software support (rendering/processing) by using only
the existing overloaded characters.
Without such an analysis, this discussion, like the previous ones, are
doomed to result in inaction.
A./
>
Received on Wed Mar 27 2013 - 14:33:37 CDT
This archive was generated by hypermail 2.2.0 : Wed Mar 27 2013 - 14:33:37 CDT