Subject: RE>>Embedded language ID pr Time: 4:13 PM Date: 9/9/95
I guess a clearer question would be: what do you want to use language ids for,
and why is it that you don't use rich text in that context?
--------------------------------------
Date: 9/9/95 12:37 PM
To: Mark Davis
From: Mark Leisher
Mark> Subject: RE>>Embedded language ID proposal Time: 5:42 PM
Mark> Date: 9/8/95
Mark> I am still unconvinced of the need to have language
Mark> information in plain text; there are legitimate needs for
Mark> that information, but there are needs for other particular
Mark> attributes that go along with rich text, and it is hard to
Mark> see why this one should be singled out.
For the most part I personally agree that language identifiers would
seem most logically markup.
But from a multilingual natural language processing perspective (and
perhaps others), having a single codeset with embedded language
identifier capability would provide an attractive reference text
representation.
Had the proposal not provided any utility for areas other than ours, I
doubt we would have bothered to present it other than as an
idiosyncrasy of our particular Unicode support implementation.
Mark> In terms of commenting on these particular suggested private
Mark> use implementations, the string scheme (LANG_ID_START text
Mark> LANG_ID_END) has the very considerable drawback of
Mark> introducing fr_FRgarbageen_US into data streams that don't
Mark> recognize LANG_ID_START, LANG_ID_END. Using independent
Mark> private use characters exclusively at least allows other
Mark> implementations to filter them out without knowing
Mark> bracketing semantics.
Telling point. I hadn't thought of that.
Mark> As far as terminology goes, these are not combining
Mark> characters: they are not positioned relative to a preceding
Mark> base character; they are not positioned at all! They are
Mark> more akin to the formatting characters such as RLM or ZWJ.
Our initial conclusion as well.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "The trick is not gaining the knowledge,
Computing Research Lab but surviving the lessons."
New Mexico State University -- "Svaha," Charles de Lint
Box 30001, Dept. 3CRL
Las Cruces, NM 88003
------------------ RFC822 Header Follows ------------------
Received: by taligent.com with SMTP;9 Sep 1995 12:34:22 -0800
Received: from taligent.com by mailserv.taligent.com (AIX 3.2/UCB 5.64/4.03)
id AA36205; Sat, 9 Sep 1995 12:35:02 -0700
Received: from UNICODE.ORG by taligent.com with SMTP (5.67/23-Oct-1991-eef)
id AA26899; Sat, 9 Sep 95 12:31:42 -0700
for
Received: by Unicode.ORG (NX5.67c/NX3.0M)
id AA25009; Sat, 9 Sep 95 12:23:10 -0700
Date: Sat, 9 Sep 95 12:23:10 -0700
From: unicode@Unicode.ORG
Message-Id: <9509091923.AA25009@Unicode.ORG>
Reply-To: mleisher@crl.nmsu.edu (Mark Leisher)
Errors-To: uni-bounce@Unicode.ORG
Subject: Re: Embedded language ID pr
To: unicode@Unicode.ORG
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT