Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Benjamin Peterson ([email protected])
Date: Thu Dec 11 2003 - 13:15:54 EST

Next message: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"

Previous message: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Michael \(michka\) Kaplan: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Thu, 11 Dec 2003 09:05:10 -0800, "Michael (michka) Kaplan"
<[email protected]> said:

> I think you are mostly mistaken here. All of the programmers I know (i.e.
> script kiddies need not apply? <grin>) call APIs. The bulk of those APIs
> deal with APIs that have no notion of any of this. They take LPWSTR or
> WCHAR
> * and a developer who does not know what those are or who incorrectly
> assumes that they are grapheme clusters will not be able to function very
> effectively.

That is the current situation for some, but it is not a desirable or
permanent situation, nor an intrinsic property of non-'script kiddy'
programming. Those APIs used to take char*s, and before that they took
7-bit byte addresses, and those bad days are now behind (most of) us.

As an application programmer, I would certainly consider a system that
insulated me from the byte/WCHAR representation of a string (except when
asked not to) to be a better system. And systems are improving in this
respect year by year. I see that in .NET I can actually step through a
string accented-character by accented-character -- wonder upon wonders!
With a lot of luck I might never have to use an C-style array as a string
again.

> Most programmers (even ones who DO deal with graphene clusters)
> need to be working below the level to which you are referring here.
>

This is true, but it is a result of inadequacies in their environments,
inadequacies that are being fixed quite rapidly.

I remember what a huge barrier the division between single and multibyte
text once seemed -- and what huge advance it was when the win32 api
became widespread and finally you could translate your English data to
Chinese without a ground-up review of the entire system (well, unless you
had to deal with GNU utils or Unix). Now the 'characters that are
composed of more than one byte' barrier is behind us and we are pushing
up against a new barrier, text elements that are composed of more than
one combining character. I fully expect this challenge to be overcome as
well -- the linguistic details may go on forever but in terms of
implementing your friendly local string type it ain't rocket science. If
application programmers are still looking at arrays of WCHARs in ten
years it'll be very surprising -- and _very_ depressing.

-- 
  Benjamin Peterson
  [email protected]

Next message: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Previous message: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Michael \(michka\) Kaplan: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 15:24:16 EST