From: D. Starner (shalesller@writeme.com)
Date: Tue Dec 09 2003 - 16:37:50 EST
> Just imagine what would be created with your assumption with this source:
> const wchar_t c = L'?';
> where ? is a combining character.
The programmer would get bit. At best, there's no reason to assume that
every compiler accepts UTF-8, besides that fact that you can't assume that
the compiler or any intermediary step doesn't normalize. That's why Unicode
escapes exist, and partially why Java as a general rule translates source into
a form that uses Unicode escapes for non-ASCII characters.
Even if you assume the compiler can accept Unicode text in whatever UTF you
choose, it still seems needlessly dangerous to use a bare combining character
instead of a Unicode escape or a numeric entity. Despite your distinction, there's
no clear line between programming editors and non-programming editors. Any editor
that gives you variable names in Hindi or Arabic is likely to have the sophistication
need to combine that ? with that ', and I see no reason they won't; quite possibly,
the underlying system won't give them the option to handle Hindi or Arabic and not
combining that ? with that '. Emacs, for one notorious programming editor, fully
plans to have that sophistication.
-- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 17:41:14 EST