Specification of Encoding of Plain Text
richard.wordingham at ntlworld.com
Fri Jan 13 03:02:32 CST 2017
On Thu, 12 Jan 2017 21:03:29 +0100
Mark Davis ☕️ <mark at macchiato.com> wrote:
> Latin is not a complex script,...
Unlike the common script, which notably has U+2044 FRACTION SLASH.
That statement is actually dubious from a typographical point of view.
> ...so it was only an illustration.
But it's good for looking for the non-obvious issues.
> A more serious effort would look at some of the issues from
> http://unicode.org/reports/tr29/, for example.
I don't think we want to have to repeat them all for each script.
Putting common-script punctuation and numbers in the regex will add
obscurity, and possibly be a maintainability issue.
More information about the Unicode