From: Mark Davis ☕ (mark@macchiato.com)
Date: Tue Feb 22 2011 - 01:56:17 CST
The default Unicode rules cannot cover all languages or circumstances
properly. It is worth bringing up to the Unicode technical committee any
proposals (and/or problem cases) with the default rules, but bear in mind
that those default rules will never be able to cover all languages
well. Acronyms,
hyphenations, and contractions present particular problems: there are some
notes on some of them in http://www.unicode.org/reports/tr29/.
You can have discussions here or on the http://unicode.org/forum/, but to
get on the next agenda (May) for the UTC, make sure that there is a proposal
filed by a member or by you on http://www.unicode.org/reporting.html.
> "word separating rules optimized for titlecasing" could be slightly
different from general word separating rules
Language-specific rules such as for titlecasing, fall under the CLDR
technical committee <http://cldr.unicode.org/>. There have been tickets
filed for adding structure and data for language-specific titlecasing some
time ago, but it hadn't reached a high enough relative priority for the
committee to work on. Having such "word separating rules optimized for
titlecasing" was the direction the committee was thinking of. I put it on
the agenda for the next CLDR meeting (that committee meets weekly by phone),
and you can file a ticket with additional information and/or example problem
cases that you'd like to see handled: http://unicode.org/cldr/trac/newticket
Mark
*— Il meglio è l’inimico del bene —*
On Mon, Feb 21, 2011 at 23:15, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:
> Hello,
>
> There's a discussion going on in W3C CSS mailing list[1] about
> specifications of the text-transform property[2], specifically how the
> "capitalize" value that titlecase specified span of text.
>
> During the discussion, two cases were presented:
>
> 1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be
> "99Ers" if we follow the rules defined in 5.18 Case Mappings. Is this
> discussed here and it's up to implementations to define which words to apply
> titlecasing, or should this be fixed in Unicode spec?
>
> 2. We're thinking to use UAX #24 to separate words and then apply
> Titlecase_Mapping to every word. But doing so makes "a.m." to be "A.m." and
> it contradicts with the general publication rules[3]. While I understand
> both separating words and titlecasing are ambiguous, cannot be perfect, and
> we must make compromises. But since Unicode defines these two rules
> separately, I guess there's a possibility that "word separating rules
> optimized for titlecasing" could be slightly different from general word
> separating rules. I haven't thought much about counter-cases for not doing
> so, but I wonder if anyone in this ML could have idea including whether we
> should do it or not, or we should include more other cases.
>
> Any feedback is greatly appreciated.
>
>
> Regards,
> Koji
>
> [1] http://lists.w3.org/Archives/Public/www-style/2011Feb/0621.html
> [2] http://dev.w3.org/csswg/css3-text/#text-transform
> [3]
> http://www.businesswritingblog.com/business_writing/2009/06/what-is-the-correct-time-am-pm-am-pm-am-pm-.html
>
>
>
This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 01:59:31 CST