Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

From: Martin J. Dürst via Unicode <unicode_at_unicode.org>
Date: Tue, 29 May 2018 13:23:13 +0900

Hello Sundar,

On 2018/05/28 04:27, SundaraRaman R via Unicode wrote:
> Hi,
>
> In languages like Ruby or Java
> (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)),
> functions to check if a character is alphabetic do that by looking for
> the 'Alphabetic' property (defined true if it's in one of the L
> categories, or Nl, or has 'Other_Alphabetic' property). When parsing
> Tamil text, this works out well for independent vowels and consonants
> (which are in Lo), and for most dependent signs (which are in Mc or Mn
> but have the 'Other_Alphabetic' property), but the very common pulli (VIRAMA)
> is neither in Lo nor has 'Other_Alphabetic', and so leads to
> concluding any string containing it to be non-alphabetic.
>
> This doesn't make sense to me since the Virama “◌்” as much of an
> alphabetic character as any of the "Dependent Vowel" characters which
> have been given the 'Other_Alphabetic' property. Is there a rationale
> behind this difference, or is it an oversight to be corrected?

I suggest submitting an error report via
https://www.unicode.org/reporting.html. I haven't studied the issue in
detail (sorry, just no time this week), but it sounds reasonable to give
the VIRAMA the 'Other_Alphabetic' property.

I'd recommend to mention examples other than Tamil in your report
(assuming they exist).

BTW, what's the method you are using in Ruby? If there's a problem in
Ruby (which I don't think; it's just using Unicode data), then please
make a bug report at https://bugs.ruby-lang.org/projects/ruby-trunk, I
should be able to follow up on that.

Regards, Martin.
Received on Mon May 28 2018 - 23:23:53 CDT

This archive was generated by hypermail 2.2.0 : Mon May 28 2018 - 23:23:54 CDT