Re: Tildes on vowels

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Tue Aug 13 2002 - 05:08:00 EDT

Previous message: Marco Cimarosti: "RE: Eleventh hour check on XML 1.1 names"
Maybe in reply to: Andrew C. West: "Re: Tildes on vowels"
Next in thread: Philipp Reichmuth: "Re: Tildes on vowels"
Reply: Philipp Reichmuth: "Re: Tildes on vowels"
Reply: James Kass: "Re: Tildes on vowels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tex Texin kindly responded to my question.

Firstly, thank you for responding. I have added some comments between items
below.

>William, hi
>
>Although the specific proposal may not have been discussed, there has
>been much discussion, which generalizes well and then can be
>specifically applied to this example. There has also been discussion of
>general principles which apply directly.
>
>1) The need for such rendering mechanisms in plain text interchange has
>not been shown.

Well, what are objective criteria for showing it? To my mind Stefan
suggested the idea and if one person finds the idea useful then need has
been shown. This is for the Private Use Area after all, and the Private Use
Area is a facility provided so that it can be used.

>
>2) Superscript, subscript, combining above, and other forms of
>identifying placement of characters, are better left to markup or other
>rendering systems and file formats (and not for a vehicle intended for
>plain text.)

Why? This call for markup seems to be some deeply held belief that is
treated as if it is a law of nature. So, some people somewhere decided to
think in terms of layers, so, that is up to them: the fact of the matter is
that using individual Private Use Area characters for matters which are
otherwise performable by a sequence of characters starting with a <
character used to mean ENTER MARKUP BUBBLE rather than its specified meaning
in the Unicode standard is perfectly reasonable. Using Private Use Area
characters does not mean redefining the meaning of a character from the
Unicode standard as does using < to mean ENTER MARKUP BUBBLE. The said
using of < to mean ENTER MARKUP BUBBLE means that using < to mean LESS-THAN
SIGN means that an end user then has to use a special mechanism to display
that standard character on a page. For example, with HTML, if one wishes to
put a < character into a display one needs to code it up so as to avoid the
browser thinking that the < is the entry point to a markup bubble.

I am not knocking markup, I am simply saying that there is a choice of ways
to do things and that sometimes a direct Private Use Area encoding is a good
choice.

>
>3) Some of these systems are also established and standardized (either
>dejure or de facto), so creating new methods in code points is
>unnecessary, and given the proposed misuse of the PUA (see next point)
>is at conflict with the goals and architecture of Unicode.

There is no misuse of the Private Use Area in what is being suggested. You
might think it not a good approach, but labelling it as misuse is unfair.

What exactly, precisely does de facto standardized mean?

>4) The PUA is for private use, and creating general purpose mechanisms
>and attempting to assign "standard" values for these mechanisms in the
>PUA,

Well, standard only amongst those end users who choose to use them, on the
particular occasions upon which they choose to use them for the particular
purpose of transcribing into a computer file some particular character from
a manuscript as a combining above or superscript.

Certainly there are some regular Unicode characters for doing this for some
combining above letters in the range U+0363 through to U+036F of which I
have only become aware as a result of reading this thread, yet I am
suggesting that in the event of a researcher finding in a manuscript a
combining above situation or a superscript character which is not encoded in
regular Unicode, then Stefan's suggested characters might be very useful,
particularly if they happen to be in a part of the Private Use Area not used
for anything else and James chose to encode them in Code2000 rather than
leave those two code points unassigned in Code2000.

I can suggest two code points if people think that that would be helpful,
indeed, if such a general method is thought to be widely useful I can
suggest two codes from the courtyard codes code space as that might make the
code points more useful.

>a) conflicts with the purpose of the PUA, (since this is not a private
>use and prevents others from privately using these codes for other
>purposes) and

Well, it is a private use and prevents no one from using those code points
for any other purpose whatsoever. However, if people who transcribe
manuscripts happen to know that there is the opportunity for an agreement to
use two code points for particular meanings in a particular circumstance
that they might encounter, I feel that knowing that if they use those two
code points anyone coming across the use of those code points in a
manuscript transcription file might have a good chance of realizing what is
meant and that one or more fonts might have the two arrow symbols coded at
those two points in the Private Use Area then that might help. Indeed, if
two code points are suggested in this list then maybe people might start
using them.

>b) conflicts with trying to make the new mechanism general purpose,
>since users of those particular PUA codes have already given them
>another purpose and cannot then use the general mechanism.

Well, people can easily use a PUA code for one purpose in one context and
for a different purpose in another context.

The issue here is that if someone wishes to transcribe a document then,
firstly, Unicode does not aim to be able to encode every possible character
that might appear in every possible document and secondly that the formal
encoding process takes a long time. I suggest that if someone is
transcribing a manuscript and happens to come across an unusual combining
above combination then if the two combining letters, say, e and z happen to
both on their own to be regular Unicode characters, then simply encoding the
sequence as e U+F386 z and viewing the text with a font which has U+F386 and
U+F387 encoded as the arrow characters suggested earlier in this thread
(particular glyph suggestions open for discussion as part of the above
consultation process) would provide a readily available method of getting
acceptable results at the level of knowing that the computer file contains
the correct characters and having some indication of the meaning on screen,
although naturally the display would not be showing e with a z above as
such or, for U+F387, a superscript z.

>
>5) Having private rules for normalization would not work well in public
>global interchange. How could one perform a search on the web, if each
>page possibly had its own mapping from one set of characters to others?
>You would never know when a string match occurred.

That is true, yet I was not suggesting that. I am suggesting that within a
specialised area of activity, namely transcribing documents and sharing the
transcriptions with others who are aware of the technique being used, that
such a Private Use Area usage could be of value.

>In short, the proposals do not solve existing problems(1,2,3), conflict
>with the current architecture (4,5), have problems themselves (5) and so
>are not enticing.

Well, perhaps this needs to be reconsidered in the light of the above
comments.

>There has been much said about these principles already.

Yes. However, I feel that it is important to remember that Unicode is
intended to be used by end users to get the results that those end users
need. The standardization process is very important, yet such facilities as
the Private Use Area are available for use and I feel that where the Private
Use Area can be used to solve problems that it is fine to use it.

Indeed, in relation to the declared aims of this mailing list, I feel that
discussion of Private Use Area uses in this list is directly on-topic.

>There is a saying about when you have a hammer, everything looks like a
>nail. The ability to attach code points with specific functions is also
>a powerful tool. And although most problems can have solutions utilizing
>this form, many believe markup and other mechanisms are just as powerful
>or moreso, and may also be more appropriate for certain situations.

I had not heard that saying before.

Markup may be more powerful for certain situations. If one needs to open a
markup bubble then that is fine. However, opening a markup bubble because
someone simply follows what is widely done without someone thinking for
himself or herself as to what is the best solution in a specific set of
circumstances is not the answer.

Why open up a markup bubble if that means that a piece of specially written
software to process the file is much more complicated than if the same
encoding had been done with a few Private Use Area code point allocations?
It just seems like using a Gigahammer to hit a nanonail if I may continue
your metaphor in a knight's move direction! :-)

>I think this is what Stefan is referring to. We need to use the
>scientific principles of induction and deduction to make the
>transposition from the specifics of one case to a general form that is
>usefully applied to the specifics of other cases. Otherwise, we are
>doomed to repeatedly state what are now accepted as basic principles on
>this list, over and over, in response to proposals for PUA usage.

Certainly use scientific principles and induction, yet also be careful not
to use fallacious reasoning due to out of scope generalizations. For
example, there is the Fallacy of the Undistributed Middle which is easy to
fall into!

There are various web links to the Fallacy of the Undistributed Middle at
http://www.yahoo.com if anyone is interested to follow this up. I first
found out about it from a book entitled Teach Yourself Logic which is one of
those Teach Yourself books which used to be very widely available in England
years ago. The Teach Yourself range is now much smaller than it used to be
and the last time that I checked Teach Yourself Logic was out of print.
However, if one goes into a second hand bookshop and asks, one might be
lucky.

It is important that ideas not be swept away by generalized dismissals when
the previously stated principles do not, in fact, cover the specific case.

>Consider using or defining markup for some of these solutions, instead
>of the PUA. By analogy, include some other tools in your repertoire, so
>that everything does not look like a code point ready to be hammered.
>

[snipped]

>
>I hope that helps. I hope the message does not read as being harsh.

Not at all.

> I intended to just be explanatory.

Yes.

> My attempt to be concise and specific,
>gives this a more pointed tone then I intend I suspect, but please
>believe it is not intended.

I am happy that that is your intention. I did not read it otherwise. I
very much believe that people can have an academic debate without
personalities being an issue. When people raise personality issues, they
are just using power to get a win, without answering the underlying
questions which still continue to exist, even if their asking has been made
.... er, taboo! :-)

William Overington

13 August 2002

Previous message: Marco Cimarosti: "RE: Eleventh hour check on XML 1.1 names"
Maybe in reply to: Andrew C. West: "Re: Tildes on vowels"
Next in thread: Philipp Reichmuth: "Re: Tildes on vowels"
Reply: Philipp Reichmuth: "Re: Tildes on vowels"
Reply: James Kass: "Re: Tildes on vowels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Aug 13 2002 - 04:33:38 EDT