Another take on the English apostrophe in Unicode

Marcel Schneider charupdate at
Fri Jun 12 09:57:06 CDT 2015

On Thu, Jun 4, 2015 at 2:38 PM Markus Scherer  wrote:

> Confusion between apostrophe and quoting -- 
> blame the scribe who came up with the ambiguous use, 
> not the people who gave it a number.

There’s a lot of confusion in writing, especially since this job was done on typewriters, where computer keyboards are derived from while the narrowing of the character sets shifted from mechanics to code pages. This is all over, thanks to Unicode and its principle defined in TUS §1.3: 

> “The Unicode Standard does not define glyph images. That is, the standard defines how characters are interpreted, not how glyphs are rendered.” Unfortunately the new precision and differenciation has sometimes been refused by sticking with legacy practice and for backwards compatibility’s sake.

The use of a paired quotation mark (U+2019) as an English apostrophe against the UTC’s initial successful attempt to disambiguate the two by recommending U+02BC (same glyph) for use as apostrophe, is a leading example of how the hard labor of ordering and clarification aiming at what in ancient Greek is called ‘Kosmos’, can at every time be thrown back to chaos by applying short views and doubtful considerations. There’s been a discussion on this Mailing List in July of 1999, that was before the release of the 3.0.0 version of the Standard: “Apostrophes, quotation marks, keyboards and typography”, when the demand for simplification was already addressed with the corrections published as version 2.1:

> Couldn't Unicode follow Microsoft and just remove the
> recommendation that U+02BC be the recommended apostrophe character and
> instead give U+2019 the dual meaning that it de-facto has already today?
[The quoted UTR#8 is now located at:]

(The shift, as viewed at NamesList level, is now highlighted at


On Thu, Jun 4, 2015 at 2:38 PM Markus Scherer  wrote further:

> If anything, Unicode might have made a mistake in  
> encoding two of these that look identical.  

> How are normal users supposed to 
> find both U+2019 and U+02BC on their keyboards,  
> and how are they supposed to deal with incorrect usage? 

I never believed it could have been a mistake, since we know that Unicode encodes semantics, not glyphs. Were there no modifier letters at all, Unicode had have to introduce an apostrophe character, because an apostrophe is not at all the same as a quotation mark and does not work the same way neither. By handling text, not theories, Ted Clancy at Mozilla clearly shows us that ambiguating the apostrophe with a close-quote brings up counterproductive complications that impact severely the productivity of the users. 

What, now, about “normal users”? To fix the issue, consider that wishing to stay all the life long with one and the same keyboard layout while at the same time, changing for a new smartphone every year or two, needs some explanation. I guess it is because keyboards don't display anyhing by themselves except keycap labels, so you're never pretty sure about them.   

We should consider, too, that before being a matter of finding on keyboard, the matter is about using. How are we supposed to choose the right one out of four apostrophe/quotes (U+0027, U+02BC, U+2019, U+2018) while many of us seem not to know or not to bother about where to place it? But supposed we do, it would effectively be much more useful to tell the machine whether we want to type an apostrophe or a quotation mark, and as about that, the existing key is enough (see T. Clancy’s blog). Is managing nested quotes already implemented in word processing? I never heard it is. Definitely, here’s a point where the simplification wished for a widespread word processing software worsened considerably the working conditions of all demanding people. The gap between word processing and desktop publishing is the smaller. 

Adding characters on your preferred keyboard on Windows is very easy using the Microsoft Keyboard Layout Creator, which has an end-user UI. As the compiled drivers are not even Windows-versioned (from NT-4 upwards), you can deploy them in your company and share among your friends without precautions. That is what users are supposed to do. If they don’t, Microsoft is not supposed to force upon. 

By contrast, if you want a Kana toggle to toggle the apostrophe key between U+0027 and U+02BC (and the quotation mark between U+0022 and a dead key for all quotation marks), you must use the Windows Driver Kit (along with some other resources) plus the MSKLC. If you wish to see it working, you may download an experimental keyboard layout on the unfinished webpage It exemplifies also the Third level solution and the Compose key solution. 

I hope that helps. 

Marcel Schneider 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list