Editing Sinhala and Similar Scripts

Philippe Verdy verdy_p at wanadoo.fr
Wed Mar 19 23:59:49 CDT 2014

The Backspace key has never been considered an "Undo" key.
Some OSes or keyboard provide an Undo key or an equivalent shortcut like
CTRL+Z (but even in this case the editor may want to undo in one operation
multiple successive insertions).
In the ti,e of typewriters; backspaces meant going back one full cluster
(in order to be able to retype it completely (e.g. with a blanking typex).
Its effect was effectively to going backwardto the start of the cluster.
On typewriters and modern computer keyboards that have dead keys, the
backspace key was ignoring that dead key and going backward to the previous
cluster. So deadkeys are not counted.
With keyboards using a compose key method, there is NO character output in
the edit buffer as long as the compose sequence is not complete, so there's
a single string inserted continaing the result of the composition and users
will not see anything inserted in the edited text before, so there's
nothing you can delete with backspace. Users also should not have to care
it the composed seuqnce was encoded in NFC or NFD form (with precomposed
characters or with decomposed base character followed by diacritics).
So they expect just one cluster.

It you really wat to delete an accent on top of a Latin letter, Backspace
is certainly not hat the Backspace key will usually perform. You would need
another key such as ALT+Backspace to *transform* the previous cluster
before the cursor into a shorter one.

But here this time, it is sometimes not really possible to predict which
diacritic will be deleted if there are multiple ones and they are unordered
(i.e. these combining diacritics have **distinct** and **non-zero**
combining classes): may be all these diacritics with distinct non-zero
combining classes should be deleted in a single operation, or otherwise
just the last **ordered** diacritic if there's still one.

(Note: here the term "diacritic" is meant "broadly" and it may be any
combining mark, or joiner control lie CGJ, or ZWJ, or ZWNJ, and sometimes a
modifier letter that may participage to the same cluster such has
apostrophe or middle dot: the Catalan letter L with middle dot may be
viewed in the editor as the letter L containing a combined diacritic, so
ALT+BACKSPACE could replace the L with middle dot by the letter L alone,
even if it's not canonically decomposable, as long as the editor knows that
it is operating within a Catalan locale)

In all cases, the action being performed by Backspace or alt+Backspace is
compeltely independant of the underlying Unicode encoding and shoudl also
be independant of the normalization form (except in advanced technical
editor mode such as "visible controls" where every encoded character is
rendered separately with a special form to make them visible).

In my opinion the standard edit mode (working in visual WYSIWYG mode)
should not depend on the encoding and Backspace should not create in the
edited text new oddities that were not really inserted and made visible
imediately when they were first entered.

Indic diacritics are entered separately from the base letter and they are
combined progressively. They are also ordered, for this reason Backspace
can remove them in a predicatable order one by one. The same could be saif
about Hebrew and Arabic diacritics entered separately (even if sometimes
they could be unordered: Backspace will will still delete all diacritics
that are in the same unordered group, even if it keeps the base letter)

But for Latin/Greek/Cyrillic keyboards that use dead keys for entering
unordered diacritics (and that are not even made visible in the document
before you have typed the base letter), it makes no sense for Backspace to
choose between these diacritics. Backspace will then delete all the full
cluster up tp the base letter.

2014-03-20 5:21 GMT+01:00 Andrew Cunningham <lang.support at gmail.com>:

> There is also a distinction between editing an existing document that you
> opened as distinct from writing a document, going back to a certain point
> in document and editing that section within the same editing session.
> In the first case their is no history, in the second case their may be
> history to work with.
> Andrew
> On 20 March 2014 14:43, Peter Constable <petercon at microsoft.com> wrote:
>> If you click into the existing text in this email and backspace, what
>> keystroke will you expect to be "erased"? Your system has no way of knowing
>> what keystroke might have been involved in creating the text.
>> What is _can_ make sense to talk about is to say that a user expects
>> execution of a particular key sequence, such as pressing a Backspace key,
>> to have a particular editing effect on the content of text. "Erasing a
>> keystroke" and "keystrokes resulting in edits" are different things. One
>> makes sense, the other does not.
>> It may seem like I'm being pedantic, but I think the distinction is
>> important. Our failure is in framing our thinking from years of experience
>> (and perhaps some behaviours originally influenced by typewriter and
>> teletype technologies) in which a keyboard has a bunch of keys that add
>> characters, and variations on that that even include a lot of logic to get
>> input keying sequences that can generate tens of thousands of different
>> character; but then one or two keys (delete, backspace) that can only
>> operate in very dumb ways. (We've also always assumed that any logic in
>> keying behaviours can be conditioned only by the input sequences, but not
>> by any existing content, but that steps beyond my earlier point.) These
>> constraints in how we think limit possibilities
>> Peter
>> -----Original Message-----
>> From: Doug Ewell [mailto:doug at ewellic.org]
>> Sent: March 19, 2014 9:39 AM
>> To: Peter Constable; unicode at unicode.org
>> Subject: RE: Editing Sinhala and Similar Scripts
>> Peter Constable <petercon at microsoft dot com> wrote:
>> >> There are two types of people:
>> >>
>> >> 1. those who fully expect Backspace to erase a single keystroke
>> >
>> > It is nonsensical to talk about erasing a _keystroke_.
>> But that's what they expect.
>> --
>> Doug Ewell | Thornton, CO, USA
>> http://ewellic.org | @DougEwell
>> _______________________________________________
>> Unicode mailing list
>> Unicode at unicode.org
>> http://unicode.org/mailman/listinfo/unicode
> --
> Andrew Cunningham
> Project Manager, Research and Development
> (Social and Digital Inclusion)
> Public Libraries and Community Engagement
> State Library of Victoria
> 328 Swanston Street
> Melbourne VIC 3000
> Australia
> Ph: +61-3-8664-7430
> Mobile: 0459 806 589
> Email: acunningham at slv.vic.gov.au
>           lang.support at gmail.com
> http://www.openroad.net.au/
> http://www.mylanguage.gov.au/
> http://www.slv.vic.gov.au/
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140320/20f2fbf3/attachment.html>

More information about the Unicode mailing list