Go romanize! Re: Counting Devanagari Aksharas from Naena Guru via Unicode on 2017-04-24 (Unicode Mail List Archive)

From: Naena Guru via Unicode <unicode_at_unicode.org>
Date: Mon, 24 Apr 2017 20:53:12 +0530

Quote by Richard:
Unless this implies a spelling reform for many languages, I'd like to
see how this works for the Tai Tham script. I'm not happy with the
Romanisation I use to work round hostile rendering engines. (My
scheme is only documented in variable hack_ss02 in the last script
blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.) For example,
there are several different ways of writing what one might naively
record as "ontarAy".

MY RESPONSE:
Richard, I stuck to the two specifications (Unicode and Font) and Sanskrit grammar. The akSara has two aspects, its sound (zabða, phoneme) and its shape. (letter, ruupa). Reduce the writing system to its consonants, vowels etc. (zabða) and assign SBCS letters/codes to them (ruupa). SBCS provides the best technical facilities for any language. (This is why now more than 130 languages romanize despite Unicode). Use English letters for similar sounds in the native speech. Now, treat all combinations as ligatures. For example, 'po' sound in Indic has the p consonant with a sign ahead plus a sign after. For the font, there is no difference between the way it makes the combination 'ä', which has a sign above and the Indic having two on either side. Recall that long ago, Unicode stopped defining fixed ligatures and asked the font makers to define them in the PUA.

Spelling and speech:
There is indeed a confusion about writing and reading in Hindi, as I have observed. Like in English and Tamil, Hindi tends to end words with a consonant. So, there is this habit among the Hindi speakers to drop the ending vowel, mostly 'a' from words that actually end with it. For example, the famous name Jayantha (miserable mine too, haha! = jayanþa as Romanized), is pronounced Jayanth by Hindi speakers. It is a Sanskrit word. Sanskrit and languages like Sinhhala have vowel ending and are traditionally spoken as such.

Dictionary is a commercial invention. When Caxton brought lead types to England, French-speaking Latin-flaunting elites did not care about the poor natives. Earlier, invading Romans forced them to drop Fuþark and adopt the 22-letter Latin alphabet. So, they improvised. Struck a line across d and made ð, Eth; added a sign to 'a' and made æ (Asc) and continued using Thorn (þ) by rounding the loop. Lead type printing hit English for the second time, ruining it as the spell standardizing began. Dictionaries sold. THE POWERFUL CAN RUIN PEOPLE'S PROPERTY BECAUSE THEY CAN IN ORDER TO MAKE MONEY. Unicode enthusiasts, take heed!

Looking at the word you gave, ontarAy, it looks to me like an Anglicized form. If I am to make a guess, its ending is like in ontarAyi. Is it said something like, own-the-raa-yi? (danger?) If I am right, this is a good example of decline if a writing system owing to bad, uncaring application of technology. We are in the Digital Age, and we need not compromise any more. In fact, we can fix errors and decadence introduced by past technologies.

RICHARD:
That sounds like a letter-assembly system.

MY RESPONSE:
Nothing assembled there, my friend.

On 4/24/2017 12:38 PM, Richard Wordingham via Unicode wrote:
> On Mon, 24 Apr 2017 00:36:26 +0530
> Naena Guru via Unicode <unicode_at_unicode.org> wrote:
>
>> The Unicode approach to Sanskrit and all Indic is flawed. Indic
>> should not be letter-assembly systems.
>>
>> Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of
>> the speech. Each writing system then assigns a shape to the
>> phonetically precise phoneme.
>>
>> The most technically and grammatically proper solution for Indic is
>> first to ROMANIZE the group of writing systems at the level of
>> phonemes. That is, assign romanized shapes to vowels, consonants,
>> prenasals, post-vowel phonemes (anusvara and visarjaniiya with its
>> allophones) etc. This approach is similar to how European languages
>> picked up Latin, improvised the script and even uses Simples and
>> Capitals repertoire. Romanizing immediately makes typing easier and
>> eliminates sometimes embarrassing ambiguity in Anglicizing -- you
>> type phonetically on key layouts close to QWERTY. (Only four
>> positions are different in Romanized Sinhala layout).
>>
>> If we drop the capitalizing rules and utilize caps to indicate the
>> 'other' forms of a common letter, we get an intuitively typed system
>> for each language, and readable too. When this is done carefully,
>> comparing phoneme sets of the languages, we can reach a common set of
>> Latin-derived SINGLE-BYTE letters completely covering all phonemes of
>> all Indic.
> Unless this implies a spelling reform for many languages, I'd like to
> see how this works for the Tai Tham script. I'm not happy with the
> Romanisation I use to work round hostile rendering engines. (My
> scheme is only documented in variable hack_ss02 in the last script
> blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.) For example,
> there are several different ways of writing what one might naively
> record as "ontarAy".
>
>> Next, each native script can be obtained by making orthographic smart
>> fonts that display the SBCS codes in the respective shapes of the
>> native scripts.
> That sounds like a letter-assembly system.
>
> So how does your scheme help one split words into orthographic
> syllables?
>
>> I have successfully romanized Sinhala and revived the full repertoire
>> of Sinhla + Sanskrit orthography losing nothing. Sinhala script is
>> perhaps the most complex of all Indic because it is used to write
>> both Sanskrit and Pali.
> What complication does Pali impose on top of Sanskrit. As far as I'm
> aware, it just needs one extra letter, usually called LLA, which you
> will already have if 'Sanskrit' includes Vedic Sanskrit.
>
>> See this: http://ahangama.com/ (It's all SBCS underneath).
>> Test here: http://ahangama.com/edit.htm
> All I get for these are blank pages. Perhaps there's an unreported
> communication failure in the network,
>
> Richard.
Received on Mon Apr 24 2017 - 10:24:46 CDT

This archive was generated by hypermail 2.2.0 : Mon Apr 24 2017 - 10:24:48 CDT