Code charts and code points
michel at suignard.com
Fri Oct 24 11:01:31 CDT 2014
I know for a fact (because I did it and just verified), that the font used for those codes use the real UCS code. The conversion happens in the PDF embedding magic. I could look into it, but I have no easy to debug the Adobe Distiller path here. Apparently when you get out of the beaten path for new characters, the preservation of code points in copy and paste operation is not bullet proof.
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Jukka K. Korpela
Sent: Friday, October 24, 2014 4:51 AM
To: unicode at unicode.org
Subject: Re: Code charts and code points
2014-10-24 11:17, "Martin J. Dürst" wrote:
> The code charts are published as PDFs. In general, text in PDFs can be
> copypasted elsewhere. Is there something in place that makes sure that
> "wrong" Unicode encodings for glyphs published in code charts don't
> leak elsewhere?
It seems that there isn’t. Whether this is serious is a different issue.
I tested with the arbitrarily chosen Ornamental Dingbats block, with the chart http://www.unicode.org/charts/PDF/Unicode-7.0/U70-1F780.pdf
Opening it in Adobe Reader XI on Win 7, I was able to select the characters with the mouse and copy and paste them to a text editor, BabelPad. It shows most of them as just boxes, identified with the correct Unicode numbers; this is the expected behavior when the editor has no suitable font in its disposal. But instead of U+1F67C VERY HEAVY SOLIDUS and U+1F67D VERY HEAVY REVERSE SOLIDUS, it shows “/” and “/”, identified as U+002F SOLIDUS and U+005C REVERSE SOLIDUS.
So apparently the font designer had placed the glyphs as assigned to SOLIDUS and REVERSE SOLIDUS, which is understandable. But this means that when the characters in the code charts are copied and pasted, or otherwise accessed at the character level, they are wrong characters.
I think it is imaginable that someone wants to copy a block of characters from the code charts, as a handy way of getting them for inspection, e.g. for testing how some particular software renders them using some particular font(s). I would expect some confusion then if you had partly got all wrong characters (code points).
Unicode mailing list
Unicode at unicode.org
More information about the Unicode