[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10898(design spec)

Opened 8 months ago

Last modified 7 months ago

PRI #367: Generic dead keys and dead-key-based selectors

Reported by: Marcel Schneider <charupdate@…> Owned by: mark
Component: keyboards Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


Older sources already promote a composition dead key and/or a remnant group selector. These are indispensable features of modern, performative keyboard layouts. This feedback proposes to redesign the composition tree on one hand, and on the other hand, to re-engineer the group selector as a dead key on AltGr/Option + Spacebar, giving access to a stack of up to eleven layout groups.


Change History

comment:1 Changed 8 months ago by Marcel Schneider <charupdate@…>

How to specify dead keys

Whether the transform pattern as specified in 5.10 Element: transform in part 7 of UTR #35, version 32, is capable to transcribe the functional reality either of the Windows dead key scheme, or of the macOS handling of dead keys, seems doubtful because it is obviously inspired by XKB exclusively. That is not only a matter of implementation. Despite of having the DEADTRANS function in the source code, a Windows layout with dead keys can hardly be specified using the actual LDML. I think one could say as much of a macOS layout, though installable ones are configured using XML (see TN2056.

Dead keys relying on Windows kbd.h

The Windows kbd.h header and its includes are part of the MSKLC referenced in Key Map Data Sources in part 7 of UTR #35, version 32, and can be used in isolation due to the MSKLC EULA not prohibiting to repurpose the use of components (whereas the Windows EULA does prohibit this).

Letʼs start with an example. In the French locale, some words contain the substring 'oe' while others are spelt with the letter o-with-e (French: “e dans l’o”), U+0153 œ, U+0152 Œ. It has been suggested to input 'œ' by using a transform of 'oe'. According to the actual UTR #35, the transform oe → œ would be performed systematically, unless we interpose a ZWNJ. That is moving on character level what elsewhere is a matter of glyphs. This is why if we really wish the transform to happen, weʼd like to press some dead key before (or between or after, provided that itʼs about specifying an IME).

Quite another example: The ^e → ê transform quoted in UTR #35 currently completes only if ^ is a dead key, not when we just type an ASCII caret followed by an e. Whether a key position is a dead key is coded in the allocation table(s). Each dead transform consists of exactly three UTF-16 code units plus one for the dead key flag that indicates whether the resulting code unit is to be output, or acts as a new dead character.

By the way, using ê instead of ^ as a dead character for the ‹ circumflex accent › dead key is more user-friendly. That may be non-obvious for this diacritic, but what about displaying a combining horn — since we havenʼt got a spacing clone — or, even worse, a hook in isolation? On macOS, ChromeOS and Linux, the horn is OK, we can add a preceding no-break space as specified in TUS. But on Windows thatʼs a non-starter. Itʼs impossible with the system features. And for the hook we might anyway prefer to see something like a whole ƒ (U+0192).

Dead keys defined in macOS *.keylayout

A key point to take away is that there seems to be no buffer, but terminators. Any dead key not having a terminator does neither display during the on-going sequence, nor output at failure. And the most important thing to know is the state machine. The empty <when/> element has three attributes: state, output, next. Output and next may both have a value. Those of state and next are user-defined strings, while the length of output is limited by maxout and may not exceed 20 characters. There are no transforms; its all a matter of what state is on ("none" by default).

It is good practice to use ISO/IEC 9995-1 key references and level/pile numbers to compose action names, so that they are easily sorted and retrieved, like this:

		<keyMap index="0">
		<!--no modifiers-->
			<!--row E-->
			<key code="10"  action="L1P1E00"/>
			<key code="18"  action="L1P1E01"/>
			<key code="19"  action="L1P1E02"/>
		<keyMap index="1">
			<!--row E-->
			<key code="10"  action="L2P1E00"/>
			<key code="18"  action="L2P1E01"/>
			<key code="19"  action="L2P1E02"/>
		<keyMap index="2">
			<!--row E-->
			<key code="10"  action="L1P2E00"/>
			<key code="18"  action="L1P2E01"/>
			<key code="19"  action="L1P2E02"/>
			<key code="20"  action="L1P2E03"/>
			<key code="29"  action="L1P2E10"/>
			<key code="27"  action="L1P2E11"/>
			<key code="24"  action="L1P2E12"/>
			<!--row D-->
			<key code="12"  action="L1P2D01"/>

Each dead key a group selector

From this we can see that a good way to represent what dead keys do, is to add supplemental graphic keymaps, most efficiently one per dead key if the four main shift states can be represented on each virtual keycap. Each key has a left pile and a right pile (this is with AltGr/Option), and two levels on each side: level 1 and level 2, rather than 0 and 1. (There are also other, less intuitive ways of labeling keycaps.)

The right pile should not be called a group, because a keymap group is the set of four keymaps (Base, Shift, Option, Shift+Option) that pertain to a dead key. (Leaving out the Shift+Option level is inefficient.)

This way we have the Circumflex group, the Grave group and so on, or adding the “accent” word.

That allows to map other characters too, beyond those bearing the eponymous diacritic, e.g. the permille sign and the section sign, on percent and ampersand in the Acute accent group or whatever group is most easily reached. Or other letters, like the US-International keyboard already having CCedilla in the Acute deadlist.

One main group selector on Spacebar

Latin letters like ed and thorn are relatively rare even in the script theyʼre used in, and a-with-e or o-with-e donʼt make so much occurrences in the orthographies they are for, especially in use cases layouts like US-International, US-Extended and US for macOS have been designed to meet. This is why they are sufficiently accessible in a keymap group that becomes active after an AltGr/Option + Spacebar combo. It could be any other dead key but this one is most intuitive, and more useful than U+00A0 on this position. The NO-BREAK SPACE as a deceitful character (itʼs fixed-width in word processors, unstable in Word, too large for French punctuation, and better spelt &nbsp; on the internet [in most places]) is more appropriately mapped to Shift + AltGr/Option + Spacebar.

Mapping those special chars and all these rare dead keys in the AltGr/Option keymap (level 1 pile 2) is discouraged because these key positions are way more useful to get frequently used ASCII symbols and punctuations more easily accessed, such as the parentheses (on D and F), the brackets (on J and K) the dollar sign on S, and so on (see the blue positions in a sample layout with French tooltips on dispoclavier.com).

Such dead-key-accessed groups have the supplemental advantage of being extensible. Once group 2 is full or at least, all intuitive positions are used, a double keypress brings group 3, and so on, while hitting that group selector and then a digit from 3 through 9 or 0, 1, or 2, takes us directly to the corresponding group 3 through 12.

A generic dead key for “transforms”

Another important feature on modern keyboard layouts is the general (or generic) dead key named Compose key. But the composition tree needs to be re-engineered, as it predates Unicode and cannot adequately support todayʼs Unicode repertoire.

A draft stem list goes as follows:

U+0020 ‹ › SPACE: ‹space› (spaces and format controls)
U+0021 ‹!› EXCLAMATION MARK: ‹dot below›
U+0022 ‹"› QUOTATION MARK: ‹double acute accent› (NOT diaeresis)
U+0023 ‹#› NUMBER SIGN: ‹turned›
U+0024 ‹$› DOLLAR SIGN: ‹currency symbol›
U+0025 ‹%› PER CENT: ‹inverted›
U+0026 ‹&› AMPERSAND: ‹curl›
U+0027 ‹'› APOSTROPHE: ‹acute accent› (two times: NOT double acute, but curly apostrophe U+2019)
U+0028 ‹(› LEFT PARENTHESIS: ‹inverted breve› (NOT breve)
U+0029 ‹)› RIGHT PARENTHESIS: ‹breve› (NOT inverted breve)
U+002A ‹*› ASTERISK: ‹ring above›
U+002B ‹+› PLUS SIGN: ‹middle dot›
U+002C ‹,› COMMA: ‹cedilla›
U+002D ‹-› HYPHEN-MINUS: ‹bar› (aka “stroke”)
U+002E ‹.› FULL STOP: ‹dot above›
U+002F ‹/› SOLIDUS: ‹stroke› (diagonal stroke)
U+003A ‹:› COLON: ‹diaeresis›
U+003B ‹;› SEMICOLON: ‹comma below› (NOT ogonek)
U+003C ‹<› LESS-THAN SIGN: ‹circumflex accent› (NOT hacek)
U+003D ‹=› EQUALS SIGN: ‹double bar›
U+003E ‹>› GREATER-THAN SIGN: ‹hacek›
U+003F ‹?› QUESTION MARK: ‹hook above›
U+0040 ‹@› COMMERCIAL AT: ‹circled›
U+005B ‹[› LEFT BRACKET: ‹crook OR hook OR ogonek› (share same dead key; depending on base character)
U+005C ‹\› REVERSE SOLIDUS: ‹reversed›
U+005D ‹]› RIGHT BRACKET: ‹horn›
U+005E ‹^› CIRCUMFLEX ACCENT: ‹superscript› (NOT circumflex accent)
U+005F ‹_› LOW LINE: ‹subscript› (NOT macron)
U+0060 ‹`› GRAVE ACCENT: ‹grave accent› (two times: ‹double grave›)
U+007B ‹{› LEFT CURLY BRACKET: ‹retroflex hook›
U+007C ‹|› VERTICAL LINE: ‹macron›
U+007D ‹}› RIGHT CURLY BRACKET: ‹palatal hook OR flourish OR swash tail› (share same dead key; depending on base character)
U+007E ‹~› TILDE: ‹tilde›

Mapping of the Compose key

As it is technically an ordinary dead key, the Compose key of the keyboard layout — as opposed to dedicated software: WinCompose — does not require a whole key. It can be mapped to AltGr/Option + O, so as to be easily reached and to be on the same level as many symbols used as parameters, if these are mapped as suggested above.

comment:2 Changed 8 months ago by Marcel Schneider <charupdate@…>

Group selector not basically a modifier

Iʼm aware that ISO/IEC 9995 talks about the group selector as being the Shift+Option modifiers pressed together and then released. (“Option” is proposed name of AltGr/Option for LDML, see on ticket #10901.) That could be implemented when keyboard layouts were programmed in machine language and stored in a PROM in the physical keyboard. Today it cannot be implemented neither on Windows nor on macOS.

This is why actual implemetations of ISO/IEC 9995 use either the supplemental modifier specified in the standard, or a dead key as a group selector. See the proposed LDML extensions for proper representation of dead keys in comment 6 on ticket #10901.

It is straightforward that the <selector> element has <dkeyMap> elements as childs, with the modifier argument, and that the <dmap> elements have the iso argument. That is implementable on macOS, that has action IDs, that may be distinct from key output. On Windows, we have only the dead character, and the result is defined using the character that would be output by the next key.

If LDML should be platform-independent, it can either fully support macOS and then not be fully implementable on Windows, or it can be tailored for Windows and then not fully describe what a macOS keylayout can perform. Therefore it would be wise to extend the language even more, giving it a flexible syntax. For support of macOS, an example derived from #10901 looks this way:

<keyboard locale="fr-FR-...">
	<keyMap modifiers="none">
		<map iso="D10" to="p" />
		<map iso="D11" selector="circumflex" />
		<selector name="circumflex" default="ê">
			<dkeyMap modifiers="none">
				<dmap iso="D01" to="â" />
				<dmap iso="D02" to="ẑ" />
				<dmap iso="D03" to="ê" />
				<dmap iso="D11" selector="superscipt" />
				<dmap iso="B03" to="ĉ" />
				<dmap iso="B09" to="·" /><!--U+00B7 MIDDLE DOT; 
						note that this is a new mapping of the period in the Base shift state 
						conforming to user demands, on B09 for consistency with Numbers level-->
				<dmap iso="A03" to="&#x0302;" />
			<dkeyMap modifiers="shift">
				<dmap iso="D01" to="Â" />
				<dmap iso="D02" to="Ẑ" />
				<dmap iso="D03" to="Ê" />
				<dmap iso="B03" to="Ĉ" />
			<dkeyMap modifiers="option"><!--this is proposed LDML for AltGr/Option-->
				<dmap iso="D02" to="‰" /><!--U+2030 PER MILLE SIGN-->
				<dmap iso="C01" to="⁂" /><!--U+2042 ASTERISM-->
			<dkeyMap modifiers="shift+option"><!--this is proposed LDML for Shift + AltGr/Option-->
				<dmap iso="A03" to="^" />
			<dkeyMap modifiers="Num"><!--Numbers modifier, can be mapped to LOption or LAlt-->
				<dmap iso="A03" to="&#x02C6;" />
		<selector name="superscipt" default="^">
			<dkeyMap modifiers="none">
				<dmap iso="D11" selector="circumflex_below" />
		<selector name="circumflex_below" default="ḙ">

comment:3 Changed 8 months ago by Marcel Schneider <charupdate@…>

LDML keyboard syntax needs more flexibility

The <transforms> and <transform> elements can be used for part of the dead keys at the condition that they are completed with missing arguments.

Actually the specification admits that a dead state can be entered either by parsing a list for the input sequences, or by hitting a dead key, but both are represented the same way. For the latter, that is a misleading simplification, and it is even practically unfeasible, given that no character can be deemed to be a dead character unless that is otherwise specified. Neither can spacing diacritics (used as such in programming or in linguistics), nor can combining diacritics, used as such for input in many locales. The ISO/IEC 9995-11 axiom that all spacing diacritics are dead characters is a methodological error because many dead keys such as for crook, palatal hook, retroflex hook, curl, flourish, swash tail, cannot be properly associated with any of these characters. And it is a technical error because one cannot leave on such a keyboard when the specified IME is unavailable, because it messes up the document when combining diacritics are output at failure of transforms.

Hence, LDML should add real support for dead keys.

Use the selector argument also in the <transform> element

One way to account for dead keys could be to add the state argument in the <map> and <transform> elements. It could have two values: "live" and "dead". "live" would be the default, and then this argument could be omitted.

The value of the to argument would then be either a dead character (e.g. "^" for ‹superscript›, "ê" for ‹circumflex accent›, "ƒ" for ‹hook›), or a dead key name.


  • In the <map> element:
    <keyMap modifiers="..." >
    	<map iso="{key reference}" to="{string}" />
    	<map iso="{key reference}" to="{character|name}" state="dead" />
  • In the <transform> element (the deadkey argument is presented below):
    <transforms type="simple" deadkey="{character|name}" >
    	<transform from="{base character}" to="{string}" />
    	<transform from="{base character}" to="{character|name}" state="dead" />

But I formally discourage such a syntax.

The selector argument as introduced in comment 6 in ticket #10901 allows for a more powerful and more streamlined syntax:

  • In the <map> element:
    <keyMap modifiers="..." >
    	<map iso="{key reference}" to="{string}" />
    	<map iso="{key reference}" selector="{character|name}" />
  • In the <transform> element (the deadkey argument is presented below):
    <transforms type="simple" deadkey="{character|name}" >
    	<transform from="{base character}" to="{string}" />
    	<transform from="{base character}" selector="{character|name}" />

This supports also chained dead keys, featured on all major platforms, as well as the iterative dead keys (and other continued dead keys) featured on macOS (see comment 3 in ticket #10851):

  • Chained dead keys:
    <transforms type="simple" deadkey="{character|name}" >
    	<transform from="{base character}" selector="{character|name}" />
  • Iterative dead keys:
    <transforms type="simple" deadkey="{character|name}" >
    	<transform from="{base character}" to="{string}" selector="{character|name}" />

Add the deadkey argument in the <transforms> element

Given that in a dead key sequence, the dead character cannot be specified by its sole character or code point (see above), it is to be moved from the from argument of the <transform> element into the parent <transforms> element. Hence, each dead key needs its own <transforms> element, containing a new deadkey argument, whose value is the name of the dead key or the dead character, depending on what is specified in the <map> element.


<transforms type="simple" deadkey="{character|name}" >
	<transform from="{base character}" to="{character}" />

Flexible representation of dead keys

Transforms are one way to describe dead keys. Another way is to use the <selectors> and <selector> elements suggested above in comment 2, and in comment 3 in ticket #10851.

It appears that for diacritic dead keys, transforms are an appropriate representation, with a better legibility than the way of using <selector> in comment 2.

For group selectors however, especially for those that are generic, not diacritic, the <selectors> and <selector> elements are more appropriate. As already stated, they can be implemented as-is on macOS, not on Windows. Itʼs up to layout developers to ensure cross-platform implementability by not specifying duplicate base characters for different output characters.

comment:4 Changed 8 months ago by Marcel Schneider <charupdate@…>

Edit: Correct spelling is: “diacritical dead keys, “diacritical group selectors.

comment:5 Changed 8 months ago by Marcel Schneider <charupdate@…>

Move transformPartial from <settings> to <transforms> and correct default

The transformPartial attribute is not something one oould specify just like UTS #35-7 is suggesting. And its default value depends on what kind of transforms are provided.

  • For live transforms, the default is to show input as the user types, and then to replace according to the list of sequences to transform. This behavior is supported by Keyman. Automatically entering a “dead state” when a matching sequence is found, is not default behavior, nor do Windows, macOS, Linux support anything of this on their own.
  • For dead transforms, the default is to show nothing. With Windows driver-based layouts, that canʼt even be changed. There is no point in having transformPartial="hide" in every LDML Windows layout definition. And on macOS it depends on every single dead state having or not having a terminator in the <terminators> element (at the end) of the keylayout.

This is why that transformPartial argument, if any, should be moved from <settings> to <transforms>, where it should be called display. And its default values should be adjusted: For dead transforms it is "hide" and it can be omitted.

Add the state argument in the <transforms> element

In actual LDML, the <transforms> and <transform> elements are used for very different things, and we have seen how to accommodate them to fit part of their actual use, while for another part theyʼd better be replaced by another syntax.

Now we propose to clarify the use of <transforms> beyond what is done by adding the deadkey argument. That was not enough, given the actual confusion in LDML.

Every <transforms> element should have the state argument whose value is either "live" or "dead".

  • A <transforms type="{simple|multiple}" state="live" > element is supported by IMEs, able to systematically edit user input, e.g. from "ab" to "c", or from "abc" to "d".
  • A <transforms type="simple" state="dead" > element can be supported by any keyboard layout, IME-powered or system-based, Windows, macOS, Linux,... User input is not parsed for edits, and input is stable and fairly predictable; e.g. "^a" stays "^a", whereas a keypress on the ‹circumflex accent› dead key followed by a keypress on the [a] key is output as "â".

The "dead" state could be implied from the presence of the deadkey argument proposed for addition in the previous comment. But that doesnʼt seem to make the difference clear enough.

Use id argument instead of deadkey argument

Now when the state argument must be present in all <transforms> elements, a <transforms> element grouping together all <transform> elements pertaining to a given dead key could be identified by a name argument, or better just id (given that its value is either the dead key name or the dead character). That is more straightforard than to use the term "deadkey", somewhat ambiguous (a key, or rather a key position).

Hereʼs the last syntax snippet of the previous comment, corrected (display="{string}" is optional):

<transforms type="simple" state="dead" id="{character|name}" display="{string}" >
	<transform from="{base character}" to="{string}" />


<transforms type="simple" state="dead" id="circumflex" display="ê" >
	<transform from="a" to="â" />
<transforms type="simple" state="dead" id="superscript" display="^" >
	<transform from="a" to="ᵃ" />

comment:6 Changed 8 months ago by Marcel Schneider <charupdate@…>

Why we need a dead key group selector — aka: Why the use of legacy AltGr should be deprecated


A dead key group selector is a dead key acting as a generic group selector, like the one on the actual implementation of the German multilingual standard keyboard layout, and that we propose to place on AltGr + Spacebar, being aware that on Windows, with driver-based layouts, those groups cannot have mappings of more than one single UTF-16 code unit per key position.

Legacy AltGr is the modifier on Right Alt key labeled 'AltGr', that on Windows is implemented as a combination of Ctrl and Alt (0x02 + 0x04 = 0x06) and is currently found on Windows keyboard layouts (and supported by MSKLC). As opposed to mapping any other modifier on that key, such as 0x10 or 0x08, being aware that Caps Lock is ineffective in combination with that other modifier.

The point in using (or not using) the 0x06 AltGr modifier

A number of native Windows keyboard layouts, among which US International, have letters in the AltGr and Shift + AltGr shift states. On most of these layouts, such letters are mapped counter-intuitively, because there are often multiple variants of the same base character. Another example is the Yoruba keyboard layout (Windows), featuring 8 combining sequences with acute/grave and dot below, to streamline input of the locale while having also all three dead keys. On a Windows-driver layout, using these to access those characters in a straightforward way is technically not feasible. Even counter-intuitive mapping on live key positions is not, as in many other locales in Africa, e.g. in Togo, a huge number of combining sequences are to be supported (see the comprehensive list in this e-mail to Unicode Public).

MS Word disambiguates the 0x06 modifier on the AltGr key with the Ctrl + Alt key combination. An unknown number of other applications, and Windows desktop custom shortcuts, donʼt. The only reason not to replace it with another modifier to solve all problems at once is that AltGr is the only modifier (beside Shift) sensitive to Caps Lock. But that matters only when bicameral letters are mapped on those levels. And that precisely is no good solution as one map full is scarcely enough to match the requirements of multilingual input on a locale-tailored layout. Users donʼt like to switch back and forth between multiple Latin layouts rather than being able to input a full Latin repertoire on their usual layout.

So we can see that on locale-tailored layouts designed for additional input of multiple languages, mapping letters on the AltGr levels is a non-starter when intended to the general public and to make it into a national standard. It can be used, however, for specialized layouts and for technical users who donʼt take offense as keyboarding grows counter-intuitive, and who are not interested in getting a better access to ASCII symbols and paired punctuation.

Recommended solution using 0x10 on RAlt and a generic dead key

The recommended use of the AltGr/Option level is a map of paired ASCII punctuation and ASCII symbols. This accommodates particularly those layouts that cannot map the brackets on keys D11 and D12. But it is interesting for all layouts given the mapping of parentheses and brackets on the home row, along with asterisk, low line, dollar sign. The complete letter key row goes:
with alphabetic mnemonics as of asterisk, grave, hat, low line, micro sign, but spatial mnemonics for () and []. Note, too, that the asterisk is not deemed to follow the A letter where this moves around, in order to maintain synergy with the slash (on B05).

For a complete map, please refer to the page linked below.

The recommended modifier on Windows is then not 0x06 any more, but any other; 0x10 is most straightforward, given that 0x08 is sensitive to Kana Lock and is preferredly used for the related purpose (Programmer toggle). On macOS, there is no change, as there is no unexpected interference of Option with application shortcuts.

The special letters in turn are then accessed on the letter keys via a dead key, that may be on AltGr/Option + Space, or elsewhere. The German T2/T3 layout has it actually on AltGr/Option + C03. There is to say that this is not an initial design choice, but a fallback to replace the access found in the ISO standard and that is not easily implemented on actual systems (I wonder even whether it was at any time). Cf. previous comments.

A sample map with this group selector on the spacebar is found on dispoclavier.com/doc/kbfrintu.

Representation of keymap groups in LDML

A straightforward representation of a layout map accessed by the generic group selector key may indeed use a <transforms> element and its <transform> childs, provided that the syntax is completed as suggested above in comment 5. The rationale is that for a consistent and intuitive user experience, such a group (lowercase and uppercase) must be mapped according to the base map. Making the “common secondary group” (ISO/IEC 9995-3) a stable feature regardless of the actual base map is really a non-starter, given that some base letters are moved around across locales, and that any intuitive mapping of special letters uses mnemonic relationships with the base letters.

Please note that this point contradicts what is stated at the very end of comment 3, above. That paragraph is therefore superseded except if some group maps use spatial mnemonics rather than alphabetic mnemonics.

comment:7 Changed 8 months ago by Marcel Schneider <charupdate@…>

Cross-platform implementability of standard keyboards

It happens that keyboard layouts like those following ISO 9995, as found in Germany, can be implemented as-is on XKB, and are implemented there, just not on Windows nor on macOS with system resources (except modified versions).

See https://bugs.freedesktop.org/show_bug.cgi?id=60991#c36

On Windows and macOS, they can be implemented for sure using Keyman.

Problem: Many countries want their standard keyboard layouts ship with Windows and macOS. So there is no point in designing standards without making sure beforehand that Microsoft and Apple will implement them or at least provide the required framework. Thatʼs not how things worked when parts of ISO/IEC 9995 were published, however.

This is why we started making keyboard layouts that can be implemented on Windows and macOS, regardless of conformance to ISO/IEC 9995.

Now all weʼre waiting for is that Microsoft, Apple and CLDR recognize the use we make of the functionalities, and that LDML is extended to represent them.

comment:8 Changed 8 months ago by Marcel Schneider <charupdate@…>

Inappropriate representation of dead keys in LDML

Actually when reading an XML keyboard layout definition file in CLDR, we are expected to check the <transforms> element to learn whether a given output is a dead character or not; a bit less when it occurs as both and the transform="no" argument is present in one instance.

That syntax is counter-intuitive, and that is putting things upside down. Reaching the declared goal of better legibility is therefore compromised, already for simple transforms, and even more when it comes to chained dead keys. And when the accurate representation of dead keys by precomposed characters is applied, the whole stuff will become fully illegible.

This is why we propose the vocabulary and syntax corrections suggested in this ticket.

comment:9 Changed 7 months ago by Marcel Schneider <charupdate@…>

Add the transforms="dead" argument in the <settings> element

To avoid updating the existing keyboard data in CLDR, the transforms="dead" argument is proposed for addition in the <settings> element. The default value of transforms in <settings> would be "default", which means that all characters listed in the first place in the <transform> elementsʼ from argument are automatically dead keys, without further tagging, while only duplicate mappings intended for live output have the transform="no" argument in the <map> element. Legacy data is then fully compatible with the new syntax.

When transform="dead" is in <settings>, any dead key must have the selector argument instead of the to argument. The <transforms> element can occur in multiple instances, and contains the state="dead" and id arguments as proposed in comment 5. This way, only the DTD is to be updated, and new layouts can be added with the more straightforward and powerful syntax, while all legacy layout data may remain as-is.

The point in extending dead key syntax

The actual syntax seems to be designed for simple keyboard layouts with few through no dead keys and only most common transforms. However, especially with respect to Latin keyboards, not accessing all dead keys on the current layouts has become a non-starter, due to globalization and the need to respect everybodyʼs orthography. Without complete Latin layouts on a per-locale basis, adding an umlaut, a hacek or a dot below needs switching back and forth between different locales, possibly with base letters moving around. Alternatively, time-consuming long-presses may or may not be implementable on a given platform on a layout driver basis.

The legacy dead key scheme using spacing diacritics is a non-starter, too, as it does not cater for letters without a decomposition mapping (except bar and stroke). To input a letter with crook, we cannot predict what squiggle will be seen during the transform. And even such diacritics as cedilla and comma below are confusable in many font families and/or font sizes. This is why we suggest that dead characters be precomposed letters throughout. Additionally the ^ ASCII circumflex is needed as a dead character for the ‹superscript› dead key, for a proper and usable fallback output, and cannot be associated with ‹circumflex accent› any longer. Using the modifier letter circumflex accent instead is no solution, given the confusability, particularly for the user.


Summing it up: Dead keys need a new syntax, less invented and closer to actual implementations, conforming to goal 2 (faithfulness).

Updating legacy data

Actual legacy data may nevertheless be updated at some point, given that the layouts already in CLDR are less numerous than those being added when Linux (full XKB) and Keyman will be admitted.

Comparison not compromised

The dual dead key syntax may be considered a concern wrt comparison of layouts.

This action however is typically performed using parsers or converters (to tabular format). These tools can be programmed to produce equivalent output from both syntaxes.

comment:10 Changed 7 months ago by kristi

  • Owner changed from anybody to mark
  • Status changed from new to design

comment:11 Changed 7 months ago by mark

  • Milestone changed from UNSCH to upcoming

There was a lot of feedback on this PRI. The keyboard group has made some modifications based on feedback, but decided to leave other features for consideration for a future version.


Add a comment

Modify Ticket

as design

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.