From: David Starner (prosfilaes@gmail.com)
Date: Fri Jan 02 2009 - 19:33:52 CST
On Fri, Jan 2, 2009 at 7:42 PM, James Kass <thunder-bird@earthlink.net> wrote:
> What does a search engine do when it runs into a Tamil
> web page encoded using non-standard PUA conventions,
> such as TUNE?
Nothing too smart. It doesn't know what language it is, or even how to
separate words, so simple questions like should don match don't, or
should don match donut, or should don match @don# (where @ and # are
equally mysterious PUA code points) are impossible to answer. It could
be Verdurian (http://www.evertype.com/standards/csur/verdurian.html) ,
for all Google knows. PUA is completely inscrutable in such a
situation, until it becomes a case like U+0093 and U+0094, where
everyone knows they're really quotes even though Unicode says
otherwise ... which is a hideous pain for everyone.
This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 19:36:49 CST