Question on script-name assignment

From: Tom Emerson (tree@basistech.com)
Date: Fri Nov 09 2001 - 09:22:23 EST

Previous message: Marco Cimarosti: "RE: What constitutes "character"?"
Next in thread: Marco Cimarosti: "RE: Question on script-name assignment"
Reply: Marco Cimarosti: "RE: Question on script-name assignment"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

One gotcha, that I run into every six months or so, is forgetting that
the punctuation characters in the Basic Latin block are classified as
Latin script. This trips me up because most of my text processing work
involves CJK, so I'll write something to filter latin characters with
(in Rosette notation):

  if (UnicodeCharacter::GetScriptSystem(c) == ss_Latin) {
      // blah blah blah
  }

while what I really wanted to say is:

  if (UnicodeCharacter::GetScriptSystem(c) == ss_Latin &&
        !AnyPunctuation(c)) {
      // blah blah blah
  }

This is confusing because the ideographic punctuation is not
considered to be CJKScript. For example, U+3001 has undefined script,
but U+002C is Latin script.

So my question is this: why (for I assume there is a Good Reason(tm)
for it) are latin punctuation classified as Latin script, but CJK
punctuation not classified as CJKScript?

I use U+002C when writing with Cyrillic and in Han'gul, two script
systems I think we can all agree are not Latin.

Thanks.

-tree

-- 
Tom Emerson                                          Basis Technology Corp.
Sr. Computational Linguist                         http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"

Previous message: Marco Cimarosti: "RE: What constitutes "character"?"
Next in thread: Marco Cimarosti: "RE: Question on script-name assignment"
Reply: Marco Cimarosti: "RE: Question on script-name assignment"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Nov 09 2001 - 10:09:58 EST