RE: dotless j

From: Christopher J. Fynn (cfynn@dircon.co.uk)
Date: Mon Jul 05 1999 - 09:14:20 EDT


G. Adam Stanislav wrote:

...
> How do you make a j lose its dot if you do not have a dotless j available?
> I don't get it.
...

See http://partners.adobe.com/asn/developer/opentype/main.htm
or
for one approach to accomplishing this.

If there are actual j-with-diacritic combinations that are not encoded as
separate characters, at first sight it seems to make some kind of sense to
have a "dotless-j". As you and Michael point out, it *would* make things easier
in applications which do not make use of the intelligent rendering engines
required to handle some non-Latin scripts.

But *if* the standard in fact states that the character dotless i is only
to be used on it's own (as in Turkish) and that you should never use dotless
i to encode i + diacritic combinations, then there probably isn't a real
case for encoding dotless-j unless such a character is actually used
in writing some language.

However if you know of specific j+diacritic combinations which are widely used
in writing some language but are not found encoded as characters in the Unicode
Standard, then perhaps you should demonstrate this and try to make the case
that these combinations should be individually encoded as separate characters
on the basis of prior usage or compatibility. Given the number of composite
Latin letters with diacritics that are already encoded as characters in the
Latin 1 Supplement, Latin Extended A, Latin Extended B, and Latin Extended Additional
blocks - the threshold between composite Latin letters that must be encoded
by their component parts and those that are accepted as characters with a unique
encoding seems pretty low.

- Chris

P.S. You might also want to look at "Unicode and Glyph Names"
http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
which sets out a system for naming glyphs in Type 1 (PS) fonts according
to the Unicode values of the corresponding characters. According to this
system you could put a j-macron glyph in a Type 1 font and give it the
glyph name uni006A0304. The idea seems to be that PostScript type
rasterizers, including ATM, will be able to parse this glyph name
and "know" that the character sequence U+006A,U+0304 (j, non-spacing macron)
should be rendered using this glyph.

This scheme allows for naming glyphs that represent combinations of up
to seven Unicode characters. It seems a lot easier to make a type-1 font
with composite glyphs named like this than it is to create a complex
OpenType font with proper GSUB tables etc.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT