Re: UTF-8 syntax

From: Peter_Constable@sil.org
Date: Fri Jun 08 2001 - 15:40:39 EDT


>This will fix the following problem for example:
>For a searching engine to search the character U-00010000 in UTF-8
string, and
>it
>could not find. But when UTF-8 is converted into UTF-16, it can found it
there
>because <ED A0 80> and <ED B0 80> are converted into U-0001000 in UTF-16.

Eh? Whatever on earth are you talking about? Are you suggesting that
someone might make a process that will take a request to locate U-0001000
in a set of UTF-8-encoded data and implement that by searching for D8 00
DC 00? If so, then it seems to me that (a) this is absolutely ridiculous
and (b) that this has nothing whatever to do with the pros or cons of
UTF-8s (other than in affecting people's impressions of the kind of
reasoning that is being used in arguing for it). Any programmer worth being
paid will know to transcode, and then can do that just as well into UTF-8
or UTF-8s. It appears to me that this argument, unless I have missed the
point, does nothing to support your proposal.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT