Re: Too narrowly defined: DIVISION SIGN & COLON from Hans Aberg on 2012-07-12 (Unicode Mail List Archive)

From: Hans Aberg <haberg-1_at_telia.com>
Date: Fri, 13 Jul 2012 00:11:27 +0200

On 12 Jul 2012, at 23:20, Julian Bradfield wrote:

[If yo do not send an email directly to me, I may overlook seeing it, due to my filtering system.]

> Hans wrote:
>> On 12 Jul 2012, at 15:54, Julian Bradfield wrote:
> ..
>>> Not to mention the symbols I've used from time to time, because
>>
>> You tell me, because I posted a request for missing characters in different forums. Perhaps you invented it after the standardization was made?
>
> Why on earth would I care about whether my pet symbol (a mu-nu
> ligature, which I started using to stand for "mu or nu as appropriate"
> when I ran out of other plausible letters for it) is in Unicode? It
> would be crazy to put it there, and of precious little benefit to me,
> since I don't wish to write web pages about this stuff.

Well, this list about Unicode, so the issue is off-topic then.

>>>> them. In math, you can always invent your own characters and styles,
>>> people do.
>> You and others knowing about those characters must make proposals if you want to see them as a part of Unicode.
>
> But wanting to do so would be crazy. My mu-nu ligature is, as far as I
> know, used only by me (and co-authors who let me do the typesetting),
> and so if Unicode has any sanity left, it would not encode it. My
> colleagues in the Edinburgh PEPA group did try to get their pet symbol
> encoded (a bowtie where the two triangles overlap somewhat rather than
> just touching), but were refused; although that symbol now appears in
> hundreds of papers by dozens of authors from all over the world. (I
> think they wanted it so they could put it on web pages, which they
> have lots of.)

Perhaps they should give another try, if now that there is wider support for its usage.

> Putting a symbol into Unicode imposes a huge burden on thousands of
> people. Everybody who thinks it important to be able to display all
> Unicode characters (or even all non-Han characters) has to make sure
> that their font has it, or that the distribution they package has it,
> or that all the software in the world knows how to find a font that
> has it. Such effort is entirely inappropriate for symbols used ad hoc
> by a small community, who are communicating in any case via either
> fully typeset documents or by TeX pseudocode - or, on occasion, with
> real TeX and a suitable font definition.

For that, there is the private use area. But it is up them if they find it useful.

>>> You mean "private use". Crazy thing to do, because then you have to
>>> worry about whether your PUA code point clashes with some other
>>> author's PUA code point.
>>
>> There is some system for avoiding that. Perhaps someone else here can inform.
>
> There are many such systems - I don't need help or advice on this
> matter. But none of them is appropriate for a symbol that perhaps you
> want only for a few papers.

Perhaps you should address that issue to the consortium, if you deem it important to you.

>>>> UTF-8 only is simplest for the programmer that has to implement it.
>>> Some of us are more concerned with users than programmers.
>> Well, if the programmers don't implement, you are left out in the cold.
>
> I'm not - if I care enough, I'll do it myself. Although most of my
> work has actually been implementing utf-8 - as I said, the legacy
> encodings are usually already done.

The support for various encodings in LaTeX2 was a teamwork, and required so much work, they nearly lost focus on the typesetting issues.

>>> Neither working mathematicians nor publishers nor
>>> typesetters like dealing with constantly changing extensions and
>>> variations on TeX - one of the biggest selling points of TeX is
>>> stability. (Defeated somewhat by the instability of LaTeX and its
>>> thousands of packages, but that's another story.)
>>> If I need to write complex - or even bidi - scripts routinely, I'd
>>> probably be forced into one of them; but the typical mathematician
>>> doesn't.
>>
>> I do not see your point here.
>
> The point is that you don't use unstable rapidly changing systems for
> anything that has an expected life of more than a year or two; and if
> you're planning for somebody else to use it, you try to give them
> something that runs on systems at least ten years older than yours.

There are different strategies with respect to how updated software to use.

>> No. TeX cannot handle UTF-8, and I recall LaTeX's capability to emulate that was limited.
>
> Somewhat limited, but good enough for every purpose I've so far needed
> (maths, phonetics; and European, Indic, Chinese, Hebrew languages in
> small snippets rather than entire documents). The main annoyance is
> that combining character support is clunky, and that TeX really
> doesn't support bidi properly - as I said - though it's remarkable
> what hacking can be done.

It was after doing such hacking for a decade or two that the other systems were developed.

>>>>> you need to encode also letters that are semantically distinctively
>>>>> roman upright.
>>>>
>>>> It has already been encoded as mathematical style, see the "Mathematical Alphanumeric Symbols" here:
>>>> http://www.unicode.org/charts/
>>>
>>> *You* look. The plain upright style is unified with the BMP characters.
>>
>> Yes, that is why the Unicode paradigm departs from the TeX one.
>
> This is as bad as Naena Guru... Unicode characters are
> fontless. They are plain text. The Unicode standard even has a
> nice little picture (Figure 2-2) showing how roman A, squashed A, bold
> italic A, script A, fancy A, sans-serif A, brush-stroke A, fancy
> script A, and versal capital A are all just LATIN LETTER A.
>
> Now, in response to the desire of some mathematicians (maybe) to
> write webpages without having to use clunky HTML markup (which is even
> worse to use than TeX's), Unicode saw fit to encode characters such as
> MATHEMATICAL BOLD ITALIC CAPITAL A.
> This is not a logical problem: that character is distinguished from
> LATIN LETTER A by the fact that its acceptable glyph variants cover a
> much narrower range than those of A.
>
> However, if you now say that MATHEMATICAL ROMAN CAPITAL A, which by
> definition must be a seriffed upright non-bold roman letter, is the
> same character as LATIN LETTER A, you must vanish in a puff of logic,
> for the same character cannot both be a fontless A and also an A that
> must be displayed in a very restricted range of glyphs.

That is a logical inconsistency - the Unicode standard is full of them. If you so want, you can propose to add the upright mathematical semantic styles as well.

> Unless, that is, you have higher level markup that tells you when A
> means A, and when it means \mathrm{A}. But if you have such higher
> level markup, you don't need all the other variants anyway.
> TeX provides such markup, by means of math mode. So TeX users can
> choose to treat A as \mathrm{A} without inconsistency. However, they
> can also choose to intepret the higher-level markup as saying "treat A
> as itself", in which case TeX can do what it likes (in particular,
> set in italic), also without inconsistency.
> Thus there is no incompability between Unicode and TeX.
>
> Similarly in MathML.
>
> However, in plain text, you are screwed. There is no way to
> distinguish between the generic A, and the A that must be roman,
> except by human intelligence.

If you use the mathematical semantic styles, and want to have the upright styles as well on the character level, as the standard is now, you have to use the BMP for that. That forces the unicode-math math-style=literal style, which is different from what original TeX does.

>> You have yourself noted that the BMP characters must be used for upright for consistent Unicode use, incompatible with TeX which sets them as italic.
>
> Which shows that Unicode is inconsistent, not that TeX is flawed.

TeX is right for 7-bit ASCII: what it originally was written for, and from which it was patched up.

>> It is because there are currently no convenient input methods, also mentioned before in this thread.
>
> There will never be a convenient input methods for thousands of
> symbols. (I've spent some time designing convenient input methods for
> the range of characters I use frequently, and I still can't always
> remember them.)

A problem is that is is very time consuming to do. This is a point where Unicode lacks.

Hans
Received on Thu Jul 12 2012 - 17:14:41 CDT

This archive was generated by hypermail 2.2.0 : Thu Jul 12 2012 - 17:14:52 CDT