2011/7/18 Asmus Freytag <asmusf_at_ix.netcom.com>:
> On 7/17/2011 12:19 PM, Philippe Verdy wrote:
>>
>> Another alternative: instead of encoding separate symbols for each
>> control, we could as well encode symbols for each character visible in
>> those symbols.
>
> I'm baffled: what problem is this elaborate scheme trying to solve?
It's not "elaborate". It is extremely simple in fact. I don't propose
to encode new symbols. I only propose to encode the decorations
themselves, separately. We currently have such encoded characters for
enclosing box decorations, but only capable of enclosing a single
character. They are encoded as "gc=Me", i.e. as combining diacritics.
> The problem was never in *how* to encode such symbols, but in *whether* they
> should be considered *characters* (and therefore need to be supported on the
> character level of the architecture). That point, whether there's a
> reasonable use case for them as characters, has not been settled, so the
> case for thinking about encoding solutions has not been established.
>
> When people write about a line feed character, they use "LF" or "linefeed"
> or 000A (or U+000A or 0x0A etc.). They commonly don't use the "LF" symbol
> character, nor any other unencoded symbol.
Yes, but they also cite them using a symbol where needed.
I have NEVER said that they used an "unencoded" symbol. They use the
symbol without even knowing if it's encoded or not, and don't care
about that !
> I claim, the same is true for ZWJ, RLO, PDF and all the other good
> characters.
>
> Just because Unicode uses dashed box placeholders in the code charts hasn't
> made them the generally accepted, universally understood *symbols* for these
> characters.
>
> This is different from the "pictures for control codes" because at the time,
> these were widely supported in devices, and users of these devices
> (terminals) were familiar with the convention (staggered small letters) and
> many would recognize common control characters.
>
> So, let's keep a lid on devising ever more arcane and fragile encoding and
> pseudo-encoding options until there's consensus that this issue must be
> addressed on the character level.
I did not speak about "pseudo encoding". I evoked it as a possible way
to represent a string that will visually, and logically, represent a
visual abbreviation like "LF" decorated with a dotted box, as an
alternative to encoding specific symbols, given the current desire of
not encoding those symbols directly.
I evoked the alternatives because it avoids the other issues
introduced in other proposals posted to the list : notably trying to
use the control character itself with some other control, in order to
escape it (I read things like using variation characters): this is
really the worst, and those other proposals are MUCH WORSE than what I
said, and are really pseudo-encoding.
It remains that there's already a demonstrated use of such decorating
boxes, not just for control characters of Unicode, but for a more
general use. You'll note that Microsoft Word already contains such
generic feature for inserting arbitrary characters in enclosing boxes
(or other graphic symbols).
Yes, of course (I have also stated that it was effectiely text
decoration, and CSS or other rich text features can already do that),
encoding those symbols directly remains an open question (Michael
Everson admits that).
My proposals are completely in-line with the other possible practices
of citing the character by their name, or abbreviation or code. But
this does not precluse the need to represent it in a more compact way,
directly within runs of surrounding texts, in such a way that it will
be visually distinct from those surrounding texts.
These proposals were a reply to the other pseudo-encoding proposals
that clearly broke the encoding model (using variation selectors or
the like, with the control character encoded directly). And they are
clearly made in order to counter the intent of encoding of new symbols
for those controls, whatever their numbers: users that want to see
those symbols made of an abbreviation and a decoration box around,
will see that. Their intent is clearly to use the stated
abbreviations, but make them more visually emphasized.
But Yes, I have already said that if you have a rich-text environment,
it is certainly best ot use the rich-text features to specifiy these
decorations (including font size and margin adjustments). They will
then use, at the plain-text level the abbreviation only, and not any
special symbol that are not really needed. So in those cases, my
proposals are not even needed, and NO other specific encoding of those
symbols are needed as well in this context of rich-text documents.
My proposals are then only made for plain-text only, where you
currently have no other solution than citing the undecorated
abbreviations (which may be ambiguous in some cases), or surrounding
them with punctuations (supposed to make this clear enough).
Citing the name or code points of those controls encoded in the
standard is a completely different and orthogonal need. This is
effectively used when citing those characters in isolation, and does
not require any pictural symbol.
When you speak about a man, you write "man", and don't need a picture
of a man. But pictures of a man also exist within symbolic
representations (a walking man in traffic signals, or a runnning man
for secury signals to an exit way in case of fire, ...) Similar need
here: using symbols versus actual words.
I think that encoding new symbols in the standard associated to its
encoded controls is not needed and effectively not demonstrated as
needed. But the need to enclose characters in boxes is demonstrated
since long (in lots of other contexts than just symbols for control
characters). And in my opinion, it certainly merits better
consideration than encoding new specific symbols associated only
visually to the Unicode control characters, and which is absolutely
not justified by any demonstrated use.
Received on Wed Jul 20 2011 - 01:53:50 CDT
This archive was generated by hypermail 2.2.0 : Wed Jul 20 2011 - 01:54:00 CDT