From: Peter_Constable@sil.org
Date: Sat Sep 28 2002 - 07:30:18 EDT
[Still off-topic, but I'm hopeful that progress can be made, so am
continuing a little farther]
On 09/27/2002 10:26:36 AM "William Overington" wrote:
>>XML is the way to go.
>
>Maybe, maybe not. The issue of U+003C being used to mean LESS-THAN SIGN
in
>documents which mix ordinary text and markup may or may not, depending
upon
>the application, be a problem.
It really isn't a problem. XML provides other means to represent that
character when it is needed as part of the content rather then as part of
the markup. It is the job of an XML parser to sort that out, and there are
various XML parsers that all handle this without a hitch and that are
freely available. Someone made reference to MathML, which is a markup
language built on XML (XML is a spec for building markup languages), and
clearly mathematicians need to be able to represent this character within
content, and the special use of U+003C for markup in XML was not seen in
any way to be an obstacle.
Your proposed markup convention would also need a parser to identify the
pieces in a stream of data. If someone wants to use U+2604 in content, you
would probably need some indirect way to represent it in a data stream.
(E.g. One can imagine a hypothetical message "My favourite Unicode
character is P1" into which someone might want to insert the COMET
character.) So, I expect you'll have to deal with the same problem anyway.
But this parser doesn't yet exist; some software developer will have to
create it. On the other hand, XML parsers exist today. If you had been
pursuing an XML-based approach, you might already be testing live
prototypes rather than discussing a hypothetical system.
Also, in an earlier message, you mentioned that you wanted to be able to
use this messaging system on the Web. And, of course, you want to be able
to represent U+003C directly in content. Did you realise that those two are
contradictory? HTML has the same heredity as XML (both are implementations
of SGML). It also uses U+003C for markup, and provides the same alternative
means to represent that character as part of content. So, if one of the
contexts within which you want your system to work is the Web, then you're
going to have to deal with indirect representation of U+003C anyway. Since
its already a magic character, why not let it be the magic character for
your proposed protocol.
XML really *is* the way to go. Please believe us. You don't need to believe
me; believe Tex, Ken, Marco and the others who have offered you this
recommendation. They really are among the most well-informed contributors
to this list.
BTW, my mail client (Lotus Notes, for better or worse) reports what time in
*my* time zone an author wrote the given message. Such reporting of time in
international communications is problematic; time zones need to be stated
explicitly. We discovered this quite a while ago after scheduling a
tele-conference; the half of the dept. in the UK assumed the time they saw
was Dallas time (or maybe they suggested the time and we were reading it),
but Notes had silently done a time zone conversion.
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
This archive was generated by hypermail 2.1.5 : Sat Sep 28 2002 - 12:31:41 EDT