RE: Titlecasing words starting with numeric glyphs and period as word separator

From: CE Whitehead (cewcathar@hotmail.com)
Date: Tue Mar 01 2011 - 17:53:09 CST

  • Next message: Shawn Steele: "RE: Titlecasing words starting with numeric glyphs and period as word separator"

    Hi.

    From: Shawn.Steele@microsoft.com
    To: cewcathar@hotmail.com; kojiishi@gluesoft.co.jp; unicode@unicode.org
    Subject: RE: Titlecasing words starting with numeric glyphs and period as word separator
    Date: Tue, 1 Mar 2011 23:01:54 +0000

    > Title casing is very language-specific, and, as noted below, what your English teacher expects likely
    > isn’t what your average programmer thinks of when they think of title casing.

     Yes and no. I do think that ultimately statistics will be used to determine what sort of title casing is used most on the web and that applications will try to follow the precedent set there . . . and in terms of grammar, what's out there sometimes does conform to current usage rules -- as seems to be the case for titles beginning with words formed from numbers, such as "49ers;" again see:
    http://www.google.com/#sclient=psy&hl=en&q=49ers&aq=0&aqi=g5&aql=f&oq=49ers&pbx=1&bav=on.1,or.&fp=42ea6e12edc6080I suppose you know that grammar and punctuation rules are to some degree ultimately reformulated to conform to usage. Also, one clarification: I'm not currently teaching English . . . I have taught it in the past.)
    (And I suppose you know that grammar and punctuation rules are to some degree ultimately reformulated to conform to usage. Also, one clarification: I'm not currently teaching English . . . I have taught it in the past.)
    Best,
     
    --C. E. Whitehead
    cewcathar@hotmail.com

    > - Shawn
     
     
    > http://blogs.msdn.com/shawnste
    > Selfhost a custom locale from \\scratch2\scratch\shawnste\customlocaledrop\install.bat
    > (Selfhost 7929)
     

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of CE Whitehead
    Sent: Tuesday, March 01, 2011 1:33 PM
    To: kojiishi@gluesoft.co.jp; unicode@unicode.org
    Subject: Titlecasing words starting with numeric glyphs and period as word separator
     
    Hi, Koji:
     
    First I would say "99ers" not "99Ers" -- I cannot imagine any case at all for "99Ers"
    (see http://www.google.com/#sclient=psy&hl=en&q=49ers&aq=0&aqi=g5&aql=f&oq=49ers&pbx=1&bav=on.1,or.&fp=42ea6e12edc6080 
    for online examples with 49ers ;
    but feel free to submit a question about this to the Chicago Manual of Style:
    http://www.chicagomanualofstyle.org/QA_submit.html).

    For your rules for text transformation in css (http://dev.w3.org/csswg/css3-text/#text-transform)
    I would limit setting rules for titlecasing, that is I might specify for that nouns, adjectives, adverbs, pronouns should be capitalized in English titles, but would not specify other more "fuzzy" rules.
     
    The only rule needed for title casing A.M/a.m. ; AM/a.m. and P.M./p.m. ; PM/pm that I can surmise is that both the "a" and "m" or "p" and "m" need to match (that is if you title case the "a" you have to title case the "m" -- so I would not be happy with P.m.).
     
    Also, as far as I know, there should be no fixed rule about title casing English prepositions less than four letters long and that are not the first word in an English title ("of" or "Of," "to" or "To," "in" or "In," "on" or "On") and perhaps no rule even for English conjunctions of less than four letters ("and" or "And," "or" or "Or," "but" or "But") although in the case of conjunctions I prefer lower case unless the conjunction is the first word in a title. Also I would never capitalize "of" in a title unless it were the first word.
     
    (My way for title-casing the title in English of a book, article, journal is:
    First letter of first word = capital 1rst letter
    Nouns, Adjectives, Adverbs, Pronouns = capital 1rst letter
    Prepositions and Conjunctions and of 4 letter or more in length = capital 1rst letter
    This, That, These, Those (determiners) = capital 1rst letter
    Conjunctions of less than 4 letters in length = lower case
    Prepositions of less than 4 letters in length = lower case
     
    And as the Chicago Manual of Style says, in a title, one of the above following a hyphen gets its first letter treated just as if it followed white space:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles22.html
     
    I don't know what to do with "etc" in a title but would probably capitalize it:
    http://www.chicagomanualofstyle.org/CMS_FAQ/Capitalization/Capitalization11.html
     
    However, what I would do with prepositions and conjunctions of 4 letter or more deviates slightly from the rules I read in the Chicago Manual of Style's info pages:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles04.html
    but see also:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles12.html);
    The Purdue Owl agrees with me -- that the short prepositions and conjunctions should not be capitalized in English:
    http://owl.english.purdue.edu/engagement/index.php?category_id=2&sub_category_id=1&article_id=42ˆà 
     
    For the complete list of questions already asked about titles at Chicago Manual of Style, go to:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles_questions01.html)
     
     
    I wonder if, in some cases, a "fuzzy logic" solution might be the solution that is needed for titles (if it could be done without using much bandwidth).
     
    In any case, I would let the browser and application developers conduct statistical analysis for things like English prepositions and "etc" in titles;
    also if you'd like brief info. on fuzzy logic, see:
    Kumar and Garg. "Intelligent Learning of Fuzzy Logic Controllers Via Neural Network and Genetic Algorithm." Duke University.
    http://www.duke.edu/~manish/UL_029.pdf (This is a pretty brief reference.)

    Best,
     
    --C. E. Whitehead
    cewcathar@hotmail.com
     
     
    From: Koji Ishii (kojiishi@gluesoft.co.jp)
    Date: Tue Feb 22 2011 - 01:15:46 CST

    > Hello,
    > There's a discussion going on in W3C CSS mailing list[1] about specifications of the text-transform
    > property[2], specifically how the "capitalize" value that titlecase specified span of text.
    > During the discussion, two cases were presented:
    > 1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be "99Ers" if we follow the rules
    > defined in 5.18 Case Mappings. Is this discussed here and it's up to implementations to define which > words to apply titlecasing, or should this be fixed in Unicode spec?
    > 2. We're thinking to use UAX #24 to separate words and then apply Titlecase_Mapping to every word. > But doing so makes "a.m." to be "A.m." and it contradicts with the general publication rules[3]. While > I understand both separating words and titlecasing are ambiguous, cannot be perfect, and we must
    > make compromises. But since Unicode defines these two rules separately, I guess there's a possibility
    > that "word separating rules optimized for titlecasing" could be slightly different from general word
    > separating rules. I haven't thought much about counter-cases for not doing so, but I wonder if anyone
    > in this ML could have idea including whether we should do it or not, or we should include more other
    > cases.
    > Any feedback is greatly appreciated.
     
    I just note that sometimes inside English titles prepositions begin with a capital letter and sometimes not thus for some parts of speech in titles "fuzzy logic" might work better than rules;
     
    I think you can have PM/AM or pm/am or P.M./A.M. or p.m./a.m. too.
     
    Thus restrict rules to noun, verbs, adjectives for English; and longer prepositions and relativizers;
    for other languages the rules are different. So I am just saying limit title casing rules to where there is no variation and leave the rest to developers to implement using fuzzy logic maybe.
     
    Best,
     
    --C. E. Whitehead
    cewcathar@hotmail.com

     
    > Regards,
    > Koji

                                                   



    This archive was generated by hypermail 2.1.5 : Tue Mar 01 2011 - 17:57:06 CST