Re: New Charakter Proposal

From: Tex Texin (
Date: Fri Nov 01 2002 - 01:23:15 EST

  • Next message: Thomas Lotze: "ct, fj and blackletter ligatures"


    Note the smiley. Ken's suggestion was a tongue in the hollow-skulls

    Yes, a 2 character sequence is less likely to occur, but is still a
    possibility, so your proposal doesn't actually fix the problem. The
    usual workaround is for a convention that uses characters with special
    semantics (ie metacharacters) to have an escape mechanism to indicate
    when a metacharacter is not to be treated as such. So perhaps 2 skull
    and crossbones in a row will be used to nullify the special meaning of
    one such character and together represent a single printable character.

    Of course, the consortium could assign another character for the special
    purpose, but there are so many special purposes that would then require
    character assignments, it would be come difficult for an application to
    take them all into account. It is better to let higher level protocols
    take over, where such abilities are needed or desired.

    As for the influence of posting a suggestion for character usage, I
    think you have made your point now, perhaps we don't need to keep
    restating it. Others have suggested this list is not a good place to
    post and suggest conventions for individual or non-standard use, since
    this is a list for standardization and subscribes to a standardization
    process. The Charman list was created for the alternative process.
    However, that suggestion doesn't seem to have had any influence...



    William Overington wrote:
    > Kenneth Whistler wrote the following.
    > >I think Marku's suggestion is correct. If you want to do
    > >something like this internally to a process, use a noncharacter
    > >code point for it. If you want to have visible display of this
    > >kind of error handling for conversion, then simply declare a
    > >convention for the use of an already existing character.
    > >My suggestion would be: U+2620. ;-) Then get people to share
    > >your convention.
    > I find this suggestion curious, particularly coming as it does from an
    > officer of the Unicode Corporation.
    > The U2600.pdf file has U+2620 under Warning signs and has = poison in its
    > description.
    > Suppose for example that the source document encoded in UTF-8 is a document
    > about chemicals found around the house and that the U+2620 character is used
    > to indicate those which are poisonous. If U+2620 is also used to include in
    > visible form an indication of an error found during decoding, then finding a
    > U+2620 character in the decoded document would lead to an ambiguous
    > situation.
    > One solution would be for the Unicode Consortium to encode an otherwise
    > unused character especially for the purpose.
    > If, however, the way forward is for an individual to declare a convention,
    > then I suggest that a sequence of at least two characters, the first being a
    > base character and the one or more others being combining items be used so
    > as to produce an otherwise highly unlikely sequence of characters.
    > For example, the character U+0304 COMBINING MACRON could be a good choice,
    > as it could be used to indicate a Boolean "not" condition with a character
    > which is otherwise unlikely to carry an accent.
    > As to which character to use for the base character, I am undecided, however
    > it should, in my opinion, not be U+2620 as that is a warning sign meaning
    > poison and could lead to confusion if looking at a document.
    > The advantage of a two character sequence is that a special piece of
    > software may be used to parse all incoming documents. Only occurrences of
    > the otherwise highly unlikely sequence will be regarded as indicating a
    > conversion problem with the encoding. If either of the two characters used
    > for the sequence is encountered other than with the rest of the sequence,
    > then it will not indicate the special effect.
    > In my comet circumflex system I use a three character detection sequence.
    > This means that in order to enter the markup universe then all three
    > characters of the sequence need to be present in sequence. Thus, a piece of
    > software can scan all incoming text messages, even those which are not
    > designed to fit in with the comet circumflex system, and not indicate a
    > comet circumflex message if, say, a U+2604 COMET character arrives as part
    > of a message.
    > Using a two or three character sequence which is otherwise highly unlikely
    > to occur is, in my opinion, a good way to indicate the presence of a special
    > feature as it allows one to monitor all text files for the special feature
    > without causing undesired responses on text files which have been prepared
    > without any regard to the special feature.
    > I feel that the influence of posting a suggestion in this mailing list is
    > often greatly underestimated. If you do post a suggested two or three
    > character sequence for the purpose that you seek, perhaps, if you wish,
    > after further discussion in this group, my feeling is that that sequence may
    > well become well known and accepted for the purpose very quickly, simply
    > because where there is a need for such a sequence then, in the absence of
    > any good reason not to do so, people will often happily use the suggested
    > format.
    > William Overington
    > 1 November 2002

    Tex Texin   cell: +1 781 789 1898
    Xen Master                
    Making e-Business Work Around the World

    This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 02:06:46 EST