Re: Titlecasing words starting with numeric glyphs and period as word separator

From: Mark Davis ☕ (mark@macchiato.com)
Date: Tue Feb 22 2011 - 01:56:17 CST

  • Next message: William_J_G Overington: "Re: [unicode] UTF-c"

    The default Unicode rules cannot cover all languages or circumstances
    properly. It is worth bringing up to the Unicode technical committee any
    proposals (and/or problem cases) with the default rules, but bear in mind
    that those default rules will never be able to cover all languages
    well. Acronyms,
    hyphenations, and contractions present particular problems: there are some
    notes on some of them in http://www.unicode.org/reports/tr29/.

    You can have discussions here or on the http://unicode.org/forum/, but to
    get on the next agenda (May) for the UTC, make sure that there is a proposal
    filed by a member or by you on http://www.unicode.org/reporting.html.

    > "word separating rules optimized for titlecasing" could be slightly
    different from general word separating rules

    Language-specific rules such as for titlecasing, fall under the CLDR
    technical committee <http://cldr.unicode.org/>. There have been tickets
    filed for adding structure and data for language-specific titlecasing some
    time ago, but it hadn't reached a high enough relative priority for the
    committee to work on. Having such "word separating rules optimized for
    titlecasing" was the direction the committee was thinking of. I put it on
    the agenda for the next CLDR meeting (that committee meets weekly by phone),
    and you can file a ticket with additional information and/or example problem
    cases that you'd like to see handled: http://unicode.org/cldr/trac/newticket

    Mark

    *— Il meglio è l’inimico del bene —*

    On Mon, Feb 21, 2011 at 23:15, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:

    > Hello,
    >
    > There's a discussion going on in W3C CSS mailing list[1] about
    > specifications of the text-transform property[2], specifically how the
    > "capitalize" value that titlecase specified span of text.
    >
    > During the discussion, two cases were presented:
    >
    > 1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be
    > "99Ers" if we follow the rules defined in 5.18 Case Mappings. Is this
    > discussed here and it's up to implementations to define which words to apply
    > titlecasing, or should this be fixed in Unicode spec?
    >
    > 2. We're thinking to use UAX #24 to separate words and then apply
    > Titlecase_Mapping to every word. But doing so makes "a.m." to be "A.m." and
    > it contradicts with the general publication rules[3]. While I understand
    > both separating words and titlecasing are ambiguous, cannot be perfect, and
    > we must make compromises. But since Unicode defines these two rules
    > separately, I guess there's a possibility that "word separating rules
    > optimized for titlecasing" could be slightly different from general word
    > separating rules. I haven't thought much about counter-cases for not doing
    > so, but I wonder if anyone in this ML could have idea including whether we
    > should do it or not, or we should include more other cases.
    >
    > Any feedback is greatly appreciated.
    >
    >
    > Regards,
    > Koji
    >
    > [1] http://lists.w3.org/Archives/Public/www-style/2011Feb/0621.html
    > [2] http://dev.w3.org/csswg/css3-text/#text-transform
    > [3]
    > http://www.businesswritingblog.com/business_writing/2009/06/what-is-the-correct-time-am-pm-am-pm-am-pm-.html
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 01:59:31 CST