Re: Removing accents and diacritics from a word from Asmus Freytag \(c\) via Unicode on 2019-07-17 (Unicode Mail List Archive)

From: Asmus Freytag \(c\) via Unicode <unicode_at_unicode.org>
Date: Wed, 17 Jul 2019 17:05:58 -0700

On 7/17/2019 11:25 AM, Sławomir Osipiuk wrote:
>
> “Transliteration”?
>
> Maybe more generic that what you’re looking for. Used for the process
> of producing the “machine readable zone” on passports:
>
> https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf (see
> section 6, page 30)
>
> “Accent folding” or “diacritic folding” is used in some places. String
> folding is “A string transform F, with the property that repeated
> applications of the same function F produce the same output: F(F(S)) =
> F(S) for all input strings S”. Accent folding is a special case of that.
>
> https://unicode.org/reports/tr23/#StringFunctionClassificationDefinitions
>
> https://alistapart.com/article/accent-folding-for-auto-complete/
>
Diacritic folding. Thanks. Just didn't think of the operation as folding
the way it came up, but that's what it is.

A./

> *From:*Unicode [mailto:unicode-bounces_at_unicode.org] *On Behalf Of
> *Asmus Freytag via Unicode
> *Sent:* Wednesday, July 17, 2019 13:38
> *To:* Unicode Mailing List
> *Subject:* Removing accents and diacritics from a word
>
> A question has come up in another context:
>
> Is there any linguistic term for describing the process of removing
> accents and diacritics from a word to create its “base form”, e.g. São
> Tomé to Sao Tome?
>
> The linguistic term "string normalization" appears not that preferable
> in a computing context.
>
> Any ideas?
>
> A./
>
>
>
Received on Wed Jul 17 2019 - 19:06:08 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 17 2019 - 19:06:09 CDT