Re: Matching opening and closing characters: How?

From: verdy_p (verdy_p@wanadoo.fr)
Date: Sat Aug 08 2009 - 00:30:31 CDT

  • Next message: Hans Aberg: "Re: Matching opening and closing characters: How?"

    In a programming language, having to cope with escape characters just to fit some quotes within the content of a quoted string is a bad thing. We should be able to use any convenient pairs of quotes by matching them correctly, and probably independantly of their left or right semantics because their left/right usage is language dependant.

    If you are programming in C, C++, Java, C#, Javascript (or shell script languages) you have curerntly no choice because this is not part of the current standard (there's no stadnard proposal to extend the set of character quoting or string quoting delimiters).

    I bet that, really, C, C++, Java, C#, Javascript compilers/interpreters (and others) should also accept all the alternate single quote characters as equivalent to the ASCII one, so that you can use any one of them in pairs instead of single quotes, but if the opening quote is left-oriented, match it the closing quote in the associated right-oriented version. This will work for character quotations like ‹e›, or ›e‹, or ‘e’, or ’e‘ (The opening quote could also use a low version and still match a high-version at end). But it would not work for ‹e‹, or ›e›, or ’e’, or ‘e‘.

    Similar thnigs could be done with double quotation marks used for strings. In languages that do not make distinctions between character constants and string constants, the two separate sets of pairs could be merged.

    The main problem with the default ASCII quotation marks recognized by programming languages is that they are also the same as the characters that are on the keyboard of most users, so these frequently used quotation marks and apotrophes (or apostrophe-like letters) typed by application users, or by translators need to be escaped (when it's not easy to replace them automatically and reliably with better quotation marks or apostrophes).

    Escaping mechanisms become rapidly ugly and cause unnecessary complications for programmers and script authors, and are frequently the cause of unexpected bugs when the syntax becomes really too complex (think about double escapes needed to escape the escaping characters themselves). Everything that can help avoiding these escapes as much as possible will be helpful (and there should also exist some alternate escaping mechanisms instead of just one with the usual backslash, to make the remaining multiple escapes more readable): ithe main problem is not that programmers are lazy, but that they can too easily make errors. The truth is that the language designers (and compiler authors) are those that are too lazy to make the programming language more user-friendly to programmers.

    Most language compilers are still not prepared to support sources encoded with Unicode characters, and still don't support Unicode identifiers, but the real need is not in the identifiers used in the programming language (hidden to most application users), but in the string constants and in text data or ressources that will be entered and displayed in the applications, including those texts that the program users did not enter themselves (but that they will expect to see in the GUI of their applications, using a correct orthography and typography for their locales.

    I also think that XML files should even be creatable and interpretable directly (by XML parsers) using other quotation characters than <> and "" or '' for element tags and attribute values (these alternate quotation characters could be declared in the leading XML declaration line, or autodetected from the existing XML declaration line and from the first element or comment line that occurs after it) in some future version of the XML standard.

    > Message du 08/08/09 04:20
    > De : "Robert Abel"
    > A : "Unicode General"
    > Copie à :
    > Objet : Re: Matching opening and closing characters: How?
    >
    > For example, in German, the initial quote is a low left one and the final a right high one, while in English, it's a high left and a high right quote. So high right would pair up with both low left and high left - which would be fully sufficient. Maybe it's enough to have a fully relaxed rule: any quote character will do instead of ".
    > But that is essentially not "right". You can have „…“ or ‚…‘ or »…« or ›…‹ for the most part. These are all known German types of using quotation marks. However, sometimes also the "English" variants are used, being “…” and ‘…’. Also notice that i.e. in French, where »…« originated, usage is reversed, like «…».
    > Also, what about the Japanese counterparts for instance? Those would be 「…」 and 『…』 -- these even have vertical variants:
    >
    > ﹃  ﹁ ﹇ ︷ ⏠
    > ︰ ︰ ︰ ︰ ︰
    > ﹄ ﹂ ﹈ ︸ ⏡
    >
    > It basically goes on and on... I think it's best to heed Kenneth' advice: less is more in this case.
    >
    > Regards,
    >
    > Robert Abel >



    This archive was generated by hypermail 2.1.5 : Sat Aug 08 2009 - 00:34:31 CDT