Re: Encoding italic (was: A last missing link) from Asmus Freytag via Unicode on 2019-01-18 (Unicode Mail List Archive)

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Fri, 18 Jan 2019 11:18:10 -0800

I would full agree and I think Mark puts it really well in the message below why some of the proposals brandished here are no longer plain text but "not-so-plain" text.

I think we are better served with a solution that provides some form of "light" rich text, for basic emphasis in short messages. The proper way for this would be some form of MarkDown standard shared across vendors, and perhaps implemented in a way that users don't necessarily need to type anything special, but that, if exported to "true" plain text, it turns into the source format for the "light" rich text.

This is an effort that's out of scope for Unicode to implement, or, I should say, if the Consortium were to take it on, it would be a separate technical standard from The Unicode Standard.

A./

PS: I really hate the creeping expansion of pseudo-encoding via VS characters. The only worse thing is adding novel control functions.

On 1/18/2019 7:51 AM, Mark E. Shoulson via Unicode wrote:

On 1/16/19 6:23 AM, Victor Gaultney via Unicode wrote:

Encoding 'begin italic' and 'end italic' would introduce difficulties when partial strings are moved, etc. But that's no different than with current punctuation. If you select the second half of a string that includes an end quote character you end up with a mismatched pair, with the same problems of interpretation as selecting the second half of a string including an 'end italic' character. Apps have to deal with it, and do, as in code editors.

It kinda IS different. If you paste in half a string, you get a mismatched or unmatched paren or quote or something. A typo, but a transient one. It looks bad where it is, but everything else is unaffected. It's no worse than hitting an extra key by mistake. If you paste in a "begin italic" and miss the "end italic", though, then *all* your text from that point on is affected! (Or maybe "all until a newline" or some other stopgap ending, but that's just damage-control, not damage-prevention.) Suddenly, letters and symbols five words/lines/paragraphs/pages look different, the pagination is all altered (by far more than merely a single extra punctuation mark, since italic fonts generally are narrower than roman). It's a disaster.

No. This kind of statefulness really is beyond what Unicode is designed to cope with. Bidi controls are (almost?) the sole exception, and even they cause their share of headaches. Encoding separate _text_ italics/bold is IMO also a disastrous idea, but I'm not putting out reasons for that now. The only really feasible suggestion I've heard is using a VS in some fashion. (Maybe let it affect whole words instead of individual characters? Makes for fewer noisy VSs, but introduces a whole other host of limitations (how to italicize part of a word, how to italicize non-letters...) and is also just damage-control, though stronger.)

Apps (and font makers) can also choose how to deal with presenting strings of text that are marked as italic. They can choose to present visual symbols to indicate begin/end, such as /this/. Or they can present it using the italic variant of the font, if available.

At which point, you have invented markdown. Instead of making Unicode declare it, just push for vendors everywhere to recognize /such notation/ as italics (OK, I know, you want dedicated characters for it which can't be confused for anything else.)

- Those who develop plain text apps (social media in particular) don't have to build in a whole markup/markdown layer into their apps

With the complexity of writing an social media app, a markup layer is really the least of the concerns when it comes to simplifying.

- Misuse of math chars for pseudo-italic would likely disappear

- The text runs between markers remain intact, so they need no special treatment in searching, selecting, etc.

- It finally, and conclusively, would end the decades of the mess in HTML that surrounds <em> and <italic>.

Adding _another_ solution to something will *never* "conclusively end" anything. On a good day, you can hope it will swamp the others, but they'll remain at least in legacy. More likely, it will just add one more way to be confused and another side to the mess. (People have pointed out here about the difficulties of distinguishing or not-distinguishing between HTML-level <i> and putative plain-text italics. And yes, that is an issue, and one that already exists with styling that can change case and such. As with anything, the question is not whether there are going to be problems, but how those problems weigh against potential benefits. That's an open question.)

My main point in suggesting that Unicode needs these characters is that italic has been used to indicate specific meaning - this text is somehow special - for over 400 years, and that content should be preserved in plain text.

There is something to this: people have been *emphasizing* text in some fashion or another for ages. There is room to call this plain text.

~mark

Received on Fri Jan 18 2019 - 13:18:22 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 18 2019 - 13:18:22 CST