Generic Base Letter

From: Vincent Setterholm (
Date: Fri Jun 25 2010 - 21:02:55 CDT

  • Next message: Tulasi: "Re: Latin Script"

    When preparing digital editions of grammars and related works, it is often necessary to show a pattern of combining marks without attaching them to a specific letter.

    This is particularly true of languages like Hebrew, Syriac and Arabic where only the consonants are 'letters' and the vowels are combining marks.

    What I'd like to see is a code point for a generic base character (or if you think we need two such code points that are strongly typed to LTR and RTL) that we can hang marks on. I'm actually partial to the dotted circle (like what Microsoft inserts willy-nilly into documents now) because I like the way a variety of marks with subtle positional differences show up against it, but that's just a detail. I'm totally fine with allowing the font to control the shape of the generic base character - in print books I see everything from dotted circles, to white circles, white boxes, hyphen-like things, large X shapes, etc., but I'm not asking for a dozen of these to match every print incarnation - just one that works.

    It's possible for one to use the latest font making technologies to build such a solution using existing marks like 25CC, but then documents become very font specific and the work is for nothing if Microsoft ignores the font tables controlling the effected glyph look-ups and inserts extra dotted circles anyways (which so far has been my experience). Without a standard code point, I've failed to get Microsoft to care about 1) not respecting the features of the font or 2) breaking old documents by changing their own behavior with regards to when dotted circles are inserted into documents without anyone asking for them.

    For example, it used to be that I could put a space in a document followed by any number of Hebrew vowels or accents and they would form together on the space. It was a bit of a bummer that with just a space you couldn't easily distinguish between characters like 059C and 059D where the relative position over the base letter is the only difference, but it sort-of worked. Then Microsoft changed the behavior in many of their technologies so that even when a space precedes the mark a dotted circle is introduced, and if there used to be two marks combining on one space (say a dagesh 05BC and a vowel such as 05B5) there are now TWO dotted circles. This heavy-handed approach is trashing my old documents, and there is no fix in sight.

    But I think if y'all at Unicode can specify the standard way to accomplish these things, we'll be a giant leap forward towards solving this problem so that we can finally make nice looking grammars using Unicode.

    If Unicode already has an official solution to this problem, I hope someone will point me in the right direction.

    Vincent Setterholm

    This archive was generated by hypermail 2.1.5 : Fri Jun 25 2010 - 21:07:47 CDT