From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Sep 21 2006 - 13:04:12 CDT
On Thu, 21 Sep 2006, Addison Phillips wrote:
> Jukka K. Korpela wrote:
- -
>> but I'm pretty sure that actual data almost universally contains just
>> normal spaces.
>
> That's probably not true. User input may be "regular spaces", but I think
> you'll find that computer systems generate non-breaking spaces.
Some systems may, but I don't think that's common at all. Think about all
the texts written using text editors or word processors, by people who
rarely even know about the no-break space, still less use it regularly.
Their programs hardly convert spaces to no-break spaces. Numeric data
written in text format by programs tends to use I/O routines that use no
thousands separator, though they might sometimes use a period or a comma
or even a space. But hardly a no-break space.
> However, here we are dealing with a
> recommendation to content authors. For a number, using a non-breaking space
> will prevent things like line-breaking from interfering with text legibility.
It will, but especially in justified text, it has a price. Besides, for a
number, it would be rather trivial for a rendering engine to avoid (by
default) a line break between sequences of digits even when they are
separated by a space. (Actually, should this be taken into account in
Unicode line breaking rules, by adding NU SP* × NU or at least NU SP × NU
there? Just a thought.)
>> I wouldn't be so worried about conversions to legacy encodings when using
>> Unicode for new data.
>
> I would, simply because users will wish to utilize text in many places that
> use legacy encodings. It is bad to have your number suddenly and inexplicably
> become "123?445?789".
You have a very good point here, but I don't think it's about legacy
encodings. Rather, it's about more limited character repertoires and about
legacy software. If you cut and paste numbers from, say, a text document
into a spreadsheet program, you may find out that fixed-width spaces won't
be recognized as spaces at all - even if no encoding problems are
involved. But on similar grounds, you may run into problems with no-break
spaces, too. Legacy software with simple ASCII-oriented input routines may
get wild when it sees a no-break space.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 13:10:40 CDT