RE: Removing accents and diacritics from a word from Tex via Unicode on 2019-07-17 (Unicode Mail List Archive)

From: Tex via Unicode <unicode_at_unicode.org>
Date: Wed, 17 Jul 2019 11:37:38 -0700

Asmus, are you including the case where an accented character maps to two unaccented characters?

e.g. Å to AA or Ä to AE

From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Asmus Freytag (c) via Unicode
Sent: Wednesday, July 17, 2019 11:07 AM
To: Norbert Lindenberg
Cc: Unicode Mailing List
Subject: Re: Removing accents and diacritics from a word

On 7/17/2019 11:02 AM, Norbert Lindenberg wrote:

“Misspelling”?

Not helpful. Anybody have a serious suggestion?

A./

On Jul 17, 2019, at 10:37, Asmus Freytag via Unicode <mailto:unicode_at_unicode.org> <unicode_at_unicode.org> wrote:

A question has come up in another context:

Is there any linguistic term for describing the process of removing accents and diacritics from a word to create its “base form”, e.g. São Tomé to Sao Tome?

The linguistic term "string normalization" appears not that preferable in a computing context.

Any ideas?

A./

Received on Wed Jul 17 2019 - 13:38:02 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 17 2019 - 13:38:03 CDT