On 28 Jan 2018, at 04:12, Richard Wordingham via Unicode <unicode_at_unicode.org<mailto:unicode_at_unicode.org>> wrote:
On Sat, 27 Jan 2018 14:13:40 -0800
Shervin Afshar <shervinafshar_at_gmail.com<mailto:shervinafshar_at_gmail.com>> wrote:
On Mon, Jan 22, 2018 at 2:08 PM, Richard Wordingham via Unicode <
unicode_at_unicode.org<mailto:unicode_at_unicode.org>> wrote:
On Mon, 22 Jan 2018 at 16:39:57, Andre Schappo via Unicode <
unicode_at_unicode.org<mailto:unicode_at_unicode.org>> wrote:
By way of example, one programming challenge I set to students a
couple of weeks ago involves diacritics. Please see
jsfiddle.net/coas/wda45gLp<http://jsfiddle.net/coas/wda45gLp><https://jsfiddle.net/coas/wda45gLp/>
Did any of them come up with the idea of using traces instead of
strings?
Cor Blimey😀 I am really pleased if the students have even heard of Unicode let alone heard of trace monoid😀
...and I confess, I knew nothing of trace monoid until I read the below wikipedia article but then again my ignorance is profound😀
BTW. these internationalised computer science exercises I have written and am writing are not part of any course or module and so are optional. In providing such exercises I am hoping to spark an interest in Unicode and internationalisation. I wrote a couple more yesterday jsfiddle.net/coas/3c7y88ot<http://jsfiddle.net/coas/3c7y88ot> & jsfiddle.net/coas/aau8cqaw<https://jsfiddle.net/coas/aau8cqaw>
André Schappo
Care to elaborate? Are you referring to sequence alignment methods?
No, I'm thinking of the trace monoid (see e.g.
https://en.wikipedia.org/wiki/Trace_monoid). One way of thinking of
strings is as concatenations of the NFD decompositions of their
constituent characters. Then the canonical equivalence classes of these
strings form the trace monoid of indecomposable characters. The theory
of regular expressions (though you may not think that mathematical
regular expressions matter) extends to trace monoids, with the
disturbing exception that the Kleene star of a regular language is not
necessarily regular. (The prototypical example is sequences (xy)^n
where x and y are distinct and commute, i.e. xy and yx are canonically
equivalent in Unicode terms. A Unicode example is the set of strings
composed only of U+0F73 TIBETAN VOWEL SIGN II - there is no FSM that
will recognise canonically equivalent strings).
One consequence of this view is that one has to think of U+1EAD LATIN
SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW (ậ) beinɡ both composed of
the Vietnamese vowel letter U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
(â) and tone mark U+0323 COMBINING DOT BELOW and also composed of, in
the spirit of Thai ISO 11940 transliteration, of the transliterated Thai
vowel U+1EA1 LATIN SMALL LETTER A WITH DOT BELOW (ạ), corresponding to
U+0E31 THAI CHARACTER MAI HAN-AKAT, and the tone mark U+0302 COMBINING
CIRCUMFLEX ACCENT, corresponding to U+0E49 THAI CHARACTER MAI THO. (In
ISO 11940 as specified, the tone mark is actually written on the
immediately preceding consonant, not on the vowel.)
Richard.
🌏 🌍 🌎
André Schappo
schappo.blogspot.co.uk<https://schappo.blogspot.co.uk>
twitter.com/andreschappo<https://twitter.com/andreschappo>
weibo.com/andreschappo<https://weibo.com/andreschappo>
groups.google.com/forum/#!forum/computer-science-curriculum-internationalization<https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization>
Received on Mon Jan 29 2018 - 07:07:48 CST
This archive was generated by hypermail 2.2.0 : Mon Jan 29 2018 - 07:07:48 CST