FW: RE: Last Call: Language Tagging in Unicode Plain Text to Prop osed

From: Chen, Qifan (qifan.chen@austx.tandem.com)
Date: Fri Jul 10 1998 - 14:56:25 EDT


> -----Original Message-----
> From: Rick McGowan [SMTP:rmcgowan@apple.com]
> Sent: Friday, July 10, 1998 12:41 PM
> To: unicode@unicode.org
> Cc: qifan.chen@austx.tandem.com
> Subject: Re: RE: Last Call: Language Tagging in Unicode Plain Text to
> Proposed
>
> Qifan asked...
>
> > I just have a small comment to the proposal which is why only including
> > TAGGED version of ASCII characters in to the set? Why can not tags
> > be composed of non-ASCII characters?
>
> Maybe the paper's rationale doesn't come through so well. The entire
> POINT
> of this thing is to make a set of "characters" that are NOT NORMAL
> characters, and use those for tagging. The set is restricted to a small
> set
> so that the entire Unicode/10646 character set does not have to be
> replicated
> in Plane 14.
>
> The requirement of protocols that are anticipated to use this scheme is
> that
> tags constructed with these tag characters can be reliably distinguished
> from the real text. It's only necessary to have a small set so that
> unique
> tags can be constructed. It is NOT required to express all possible text
>
> with these things, only to express limited tokens used for tagging
> schemes.
>
> The purpose of this all is to support out-of-band tagging in Internet (and
>
> maybe other) protocols so that you can immediately and unambiguously
> distinguish TAG data from "real" data, and strip or skip the tags as
> necessary.
>
> Does that make sense?
> [Chen, Qifan] It does but to a limited extend. Let me explain why.
>
> First, it has the requirement that not all text are to be used for the
> tagging purpose.
> But we are talking about Unicode. Why one subset of characters in it
> should be
> more important than others?
>
> Second, allowing all Unicode characters in tags (excluding BEGIN/CANCEL
> tag char) does not impose heavy overhead to distinguish tags from non
> tags. A single
> state is enough and it can be done by a several lines of code.
>
> Third, what happen in the future that non-ascii characters are needed in
> tags? How do we meet that requirement?
>
> To me, adding tagging characters to Unicode is a good idea. But the
> prosposed
> solution is not general enough and a simpler solution exists.
>
> Hope this explains the idea.
>
> --qifan
>
>
> Rick



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT