Re: Misspelling or Miscoding?

From: Mark Davis ☕️ <mark_at_macchiato.com>
Date: Thu, 19 Jan 2017 08:52:05 +0100

We don't have any set terminology for what you're talking about.

We've often just used 'misspelling' in a broad sense, which can include
visually confusable or identical glyphs. For example, spelling 'of' with an
omicron would be one, as well as a word in a complex script with swapped
marks. And cases of the former occur surprisingly often in web pages:
probably to do with people switching keyboards in mid-stride. They are in
(say) a Greek keyboard, hit omicron and then the Greek character in the 'f'
position, notice it is wrong, and backspace — but just over the character
that 'looks' wrong — then type 'f'.

The problem with using the term "miscoding" is that it is overloaded. It
can be used as having something to do with the character encoding level:
for example, interpreting a string of UTF-8 bytes as Latin-1. The sequence
<omicron, f> is a perfectly valid Unicode string, not — in that sense —
miscoded.

Mark

On Thu, Jan 19, 2017 at 2:12 AM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> On Wed, 18 Jan 2017 13:35:55 -0700
> "Doug Ewell" <doug_at_ewellic.org> wrote:
>
> > Richard Wordingham wrote:
> >
> > > I think it is not a 'typographical error' if it renders as it
> > > should!
> >
> > What if it renders correctly on some systems but not on others?
>
> > I do see your point, though. Writing systems that permit different
> > spellings of the same glyph (cluster), only one of which is 'correct'
> > even after normalization, can be tricky like this. I think this would
> > still be a matter of 'misspelling' rather than 'miscoding' because a
> > typist should not have to be concerned with character codes per se.
>
> As you've put it, it sounds like the way things were with a simple Thai
> typewriter. A vowel below, a vowel above and a tone mark could be
> typed in any order, as though they had three different non-zero
> combining classes. Thais were trained to type into computers by input
> routines only accepting the marks in the correct order - this was
> before the days of canonical combining classes.
>
> In the case of greatest concern to me, there can be two different
> orders, but only one is appropriate for a given word. In most cases,
> only one word of that appearance exists, and one can usually guess which
> one does exist. (That is why the system works despite the occasional
> ambiguity.) It's not unlike how Thai would work had phonetic order
> been successfully insisted upon, except that there is no evidence that
> sorting should be by appearance, whereas in Thai as it was encoded
> before Unicode (and is now, after normalisation), encoding and sorting
> are based purely on appearance. (Well, officially - in practice, Thais
> appear to sort by doing syllable-by-syllable comparisons.)
>
> In this case of concern, the range of renderings is occasionally
> different, which is another reason that two different encodings for the
> same appearance must be tolerated.
>
> Richard.
>
Received on Thu Jan 19 2017 - 01:53:15 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 19 2017 - 01:53:15 CST