Re: Offlist: complex rendering

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Tue, 19 Jun 2012 14:41:39 +0200

2012/6/19 Naena Guru <naenaguru_at_gmail.com>:
> Below are links to two files that show the first paragraph of this web page:
> http://www.divaina.com/2012/06/17/scholast.html
> Unicode Sinhala:
> http://ahangama.com/sing/DBS.htm (4 kB)
> Romanized Singhala:
> http://ahangama.com/sing/DSS.htm (1 kB)
>
> Compare the shape formation and the sizes of the files. How much bandwidth
> is taken for the Unicode Sinhala file to go as UFT-8? 6kB! Six times the
> romanized file. Beyond that, imagine how that Unicode Singhala page was made
> from scratch and how many more steps were needed to get there than Latin-1
> page. If you closely inspect the original page from Divaina, you see that
> they did not input Unicode Sinhala directly but used an intermediary step.
> (There are two stray English letters). These are things that matter for
> ordinary citizens, not university dons paid and venerated by those same poor
> citizens.

Really the raw size does not matter, generic data compression
implemented in so many protocols will effectively render roughyly the
same usage bandwidth almost independantly from the encoding used.

What is criticized is definitely not the fact you want to use
transliteration. Transliteration is great, as long as you are not
forcing Latin letters to look like Sinhalese letters with a hacked
font.

If you want to ease the input of Sinhalese using Lattin keyboard
layouts, and a Latin-based orthography, then you should EITHER :

(1) present the encoded text directly in the Latin script - we should
not need even the hacked font you force us to use. We would even see
that you have used a strange order of Latin letters, and strange
choice for letters, that are not even easy to type on a Latin keyboard
(a native ISCII-based keyboard for Sinhalese script is MUCH easier to
use as there's no capitals to handle, the Shift key offers access to
all letters, CapsLock may be used to switch to Latin if needed)

(2) OR implement an IME that will allow inputing Sinhalese using a
REALLY EASY Latin keyboard layout (your system is not easy to type on
ANY existing standard Latin layout), and whose result will be an
assisted transliteration to Sinhalese (with helpers to select cases
when there are ambiguities, just like with Chinese IME using Pinyin,
or a Japanese IME based on Romaji input)

Your system attempts to mimic the true native Sinhalese orthography,
but using a completely invented Latin orthography just based on veryr
basic one-to-one transliteration. The result is possibly good for an
IME, but really a very poor representation for the Latin script, as it
was definitely not designed with the Latin script as the target for
readers, but only the Sinhalese script as the target for native
readers. It is also a poor IME for this target as it is even more
complicate to type.

It also complicates a lot the situation of texts that need to
represent multiple scripts at the same time: you constantly have to
create documents that will switch between your hacked font and a
normal Latin font (this is not an easy solution to input for document
creators).

So instead, please stay in focus:

- Develop a REAL transliterator that will make the Sinhalese language
really readable for people that can't read the Sinhalese script : this
will propably not interest the native Sinhalese people for which yhou
are developing your site, so you have already lost that battle, given
that there are better transliterators available, backed by several
industry standards (used by librarians for examples). No hacked font
needed, the result MUST be shown using acceptable glyphs for Latin
that native Latin readers can recognize. Almost all existing Latin
font should be usable provided that these fonts are mapping the Latin
letters you'll use. But beware that lettercase in the Latin script is
very weak and cannot be used as a strong distinction between very
distinct Sinhalese phonems.

- Develop a REAL IME for Sinhalese whose ouput will be Unicode
Sinhalese. Develop a standard font based on standard assignments in
the UCS. Publish documents on the web encoded with the UCS standard.
Don't worry about UTF-8 encoding sizes (remember that compression is
now transparent in transport protocols, and notably on mobile networks
that have bandwidth constraints, and that text is only a very small
part of the content really inserted now in pages featured with lots of
scripts, photos, images, stylesheets, ads, and collaborative tools)

- Help fixing bugs in existing fonts based in Sinhalese. Participate
to open-source development teams for various environments (renderers,
fonts, website development tools, web browsers, mobile OS'es for
smartphones and tablets...). If there are difficulties, help
documenting better the OpenType specifications for Sinhalese :
participate to forums of users developing fonts with existing font
development tools. Speak about the best features to implement: those
that are necessary first, then optional features that must not break
the reast but can provide additional features for fine typography or
specialized contexts (e.g. monospaced variants for editors or textual
database files, glyph variants for numbers in tables, decorative
features, featues that will enable some rare or historic
ligatures...).

Your work for now does not help anyone, because you attempt to mix
everything between these targets, by violating all accepted standards.
It creates MORE ambiguities without really helping to fixing any
existing problem.

What you've really done is creating another encoding standard that is
NOT conforming to any ISO 8859 code page, and NOT to the UCS which is
now the only target on which almost all developments are done
worldwide. But the problem is that you DO NOT identify this encoding
correctly. If you really want to use it and promote it, you should
register for an encoding name in the IANA charset registry. Only then
it will be possible to ***transcode*** (not "transliterate") your
scheme to a standard UTF.

STOP pretending that your pages are in ISO8859-1 when they are
definitely NOT ! and when they require specific switches with specific
stylesheets using your specific font (working only with browsers that
implement the webfont system you have used). For example your site is
unreadable on many smartphones, on most HDTVs (most of them will not
honor webfonts)... Finally, your system is definitely not
interchangeable if it starts by violating all standards without even
identifying itself so that these ciolations can be detected and worked
around by a SAFE technical solution.

Note that the webfont system you use is even worse in terms of
bandwidth and in terms of local storage needed in the reading device
(they typically have a limited amount of RAM, to save batteries, and
read-write access to the flashable area is limited in size except on
the most expensive devices, and frequently much slower, even if it is
partly cached in RAM).
Received on Tue Jun 19 2012 - 07:48:34 CDT

This archive was generated by hypermail 2.2.0 : Tue Jun 19 2012 - 07:48:56 CDT