RE: Removing accents and diacritics from a word from Sławomir Osipiuk via Unicode on 2019-07-17 (Unicode Mail List Archive)

From: Sławomir Osipiuk via Unicode <unicode_at_unicode.org>
Date: Wed, 17 Jul 2019 14:25:02 -0400

“Transliteration”?

Maybe more generic that what you’re looking for. Used for the process of producing the “machine readable zone” on passports:

https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf (see section 6, page 30)

“Accent folding” or “diacritic folding” is used in some places. String folding is “A string transform F, with the property that repeated applications of the same function F produce the same output: F(F(S)) = F(S) for all input strings S”. Accent folding is a special case of that.

https://unicode.org/reports/tr23/#StringFunctionClassificationDefinitions

https://alistapart.com/article/accent-folding-for-auto-complete/

From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Asmus Freytag via Unicode
Sent: Wednesday, July 17, 2019 13:38
To: Unicode Mailing List
Subject: Removing accents and diacritics from a word

A question has come up in another context:

Is there any linguistic term for describing the process of removing accents and diacritics from a word to create its “base form”, e.g. São Tomé to Sao Tome?

The linguistic term "string normalization" appears not that preferable in a computing context.

Any ideas?

A./
Received on Wed Jul 17 2019 - 13:25:30 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 17 2019 - 13:25:30 CDT