On 3/21/2013 4:22 PM, Philippe Verdy wrote:
> 2013/3/21 Richard Wordingham <richard.wordingham_at_ntlworld.com>:
>>> Further, the code chart glyphs for the ANO TELEIA and the MIDDLE DOT
>>> differ, see attachment. If they are canonically equivalent, and one
>>> is a mandatory decomposition of the other, why do they have differing
>>> glyphs?
>> Because the codepoints are usually associated with different fonts?
>> For a more striking example, compare the code chart glyphs for U+2F831,
>> U+2F832 and U+2F833, which are all canonically equivalent to U+537F.
> This is another good example where a semantic variation selector
Philippe, let's not go there.
"Semantic" selectors are pure pseudo-coding, because if the semantic
differentiation is needed it is needed in plain text - and then it
should be expressible in plain character codes.
If you need to annotate text with the results of semantic analysis as
performed by a human reader, then you either need XML, or some other
format that can express that particular intent.
Internal to your application you can design a light weight markup format
using "noncharacters", if you wish, but for portability of this kind of
information you would be best off going with something widely supported.
The number of conventions that can be applicable to certain punctuation
characters is truly staggering, and it seems unlikely that Unicode is
the right place to
a) discover all of them or
b) standardize an expression for them.
The problem is, even if you could "encode" some selectors for certain
common cases, the scheme would not be extensible to capture other
information that pre-processing (or user input) might have provided and
which might be useful to carry around in certain implementations - I'm
thinking here that the full spectrum of natural language analysis for
word-types might be as interesting as certain individual characters.
A./
A./
Received on Fri Mar 22 2013 - 01:14:08 CDT
This archive was generated by hypermail 2.2.0 : Fri Mar 22 2013 - 01:14:10 CDT