RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Tue Dec 09 2003 - 10:00:17 EST

Next message: John Jenkins: "Re: Ideographic Description Characters"

Previous message: Philippe Verdy: "RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Next in thread: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Maybe reply: Arcane Jill: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Maybe reply: Arcane Jill: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Maybe reply: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hmm. Now here's some C++ source code (syntax colored as Philippe
suggests, to imply that the text editor understands C++ at least well
:enough to color it)

int n = wcslen(L"café");

(That's int n = wcslen(L"café"); for those without HTML email)

The L prefix on a string literal makes it a wide-character string, and
wcslen() is simply a wide-character version of strlen(). (There is no
guarantee that "wide character" means "Unicode character", but let's
just assume that it does, for the moment).

So, should n equal four or five? The answer would appear to depend on
whether or not the source file was saved in NFC or NFD format.

There is more to consider than just how and whether a text editor
normalizes. If a text editor is capable of dealing with Unicode text,
perhaps it should also be able to explicitly DISPLAY the actual
composition form of every glyph. The question I posed in the previous
paragraph should ideally be obvious by sight - if you see four
characters, there are four characters; if you see five characters, there
are five characters. This implies that such a text editor should display
NFD text as separate glyphs for each character.

On the other hand, such a text editor must also acknowledge that "é" and
"e + U+0301" are actually equivalent. The /intention/ of canonical
equivalence is that the glyphs should display the same - otherwise we'd
need precomposed versions of, well, everything. So in other contexts, is
should display them the same.

Yuk. That's a lot to think about for anyone considering writing a
programmers' text editor with /serious/ Unicode support.
Jill

-----Original Message-----
From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
Sent: Tuesday, December 09, 2003 2:04 PM
To: jcowan@reutershealth.com
Cc: Unicode@Unicode.Org
Subject: RE: Coloured diacritics (Was: Transcoding Tamil in the
presence of markup)

I would not like to use any Unicode plain-text editor that implicitly
normalizes the text without asking me, to work on programming source
files or XML or HTML files. But I will accept it, if the editor really
understands the language or XML syntax (and exhibits it to the user with
syntax coloring).

Next message: John Jenkins: "Re: Ideographic Description Characters"
Previous message: Philippe Verdy: "RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Next in thread: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Maybe reply: Arcane Jill: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Maybe reply: Arcane Jill: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Maybe reply: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 10:50:00 EST