Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic? from Asmus Freytag via Unicode on 2018-05-28 (Unicode Mail List Archive)

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Mon, 28 May 2018 21:44:11 -0700

One of the general principles is that combining marks inherit the
property of their base character.

Normally, "inherited" should be the only property value for combining marks.

There have been some deviations from this over the years, for various
reasons, and there are some properties (such as general category) where
it is necessary to recognize the character as combining, but the general
principle still holds.

Therefore, if you are trying to see whether a string is alphabetic,
combining marks should be "transparent" to such an algorithm.

A./

On 5/28/2018 9:23 PM, Martin J. Dürst via Unicode wrote:
> Hello Sundar,
>
> On 2018/05/28 04:27, SundaraRaman R via Unicode wrote:
>> Hi,
>>
>> In languages like Ruby or Java
>> (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)),
>>
>> functions to check if a character is alphabetic do that by looking for
>> the 'Alphabetic' property (defined true if it's in one of the L
>> categories, or Nl, or has 'Other_Alphabetic' property). When parsing
>> Tamil text, this works out well for independent vowels and consonants
>> (which are in Lo), and for most dependent signs (which are in Mc or Mn
>> but have the 'Other_Alphabetic' property), but the very common pulli
>> (VIRAMA)
>> is neither in Lo nor has 'Other_Alphabetic', and so leads to
>> concluding any string containing it to be non-alphabetic.
>>
>> This doesn't make sense to me since the Virama “◌்” as much of an
>> alphabetic character as any of the "Dependent Vowel" characters which
>> have been given the 'Other_Alphabetic' property. Is there a rationale
>> behind this difference, or is it an oversight to be corrected?
>
> I suggest submitting an error report via
> https://www.unicode.org/reporting.html. I haven't studied the issue in
> detail (sorry, just no time this week), but it sounds reasonable to
> give the VIRAMA the 'Other_Alphabetic' property.
>
> I'd recommend to mention examples other than Tamil in your report
> (assuming they exist).
>
> BTW, what's the method you are using in Ruby? If there's a problem in
> Ruby (which I don't think; it's just using Unicode data), then please
> make a bug report at https://bugs.ruby-lang.org/projects/ruby-trunk, I
> should be able to follow up on that.
>
> Regards, Martin.
>
Received on Mon May 28 2018 - 23:44:07 CDT

This archive was generated by hypermail 2.2.0 : Mon May 28 2018 - 23:44:07 CDT