Word spacing in HTML

From: Pete Resnick (presnick@qualcomm.com)
Date: Wed May 10 2000 - 15:28:03 EDT


RFC 2070 says the following with regard to HTML:

       NOTE -- RFC 1866 section 4.2.2 specifies that an HTML user agent
       should treat an end of line as a word space, except in
       preformatted text. This should be interpreted in the context of
       the script being processed, as the way words are separated in
       writing is script-dependent. For some scripts (e.g. Latin), a
       word space is just a space, but in other scripts (e.g. Thai) it is
       a zero-width word separator, whereas in yet other scripts (e.g.
       Japanese) it is nothing at all, i.e. totally ignored.

That's nice. However, so far I can't find anyone who can give me a
way to implement this particular note. Can I tell algorithmically
whether the two characters I'm trying to put a space between are such
that it should be a space, a zero-width word separator, or nothing at
all? Obviously I can test to see if they are in a particular script
range in the case of Unicode (which is what I'm working with), but I
don't have an exhaustive list of which scripts get which treatment.
Can anyone help?

pr

-- 
Pete Resnick <mailto:presnick@qualcomm.com>
Eudora Engineering - QUALCOMM Incorporated
Ph: (217)337-6377 or (858)651-4478, Fax: (858)651-1102



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT