RE: Other Question, Problem, or Feedback

From: Dean Harding (dean.harding@dload.com.au)
Date: Mon Jun 12 2006 - 21:49:23 CDT

Next message: J Andrew Lipscomb: "Re: unicode Digest V6 #126"

Previous message: Richard Wordingham: "Re: triple diacritic (sch with ligature tie in a German dialect writing document)"
In reply to: Richard Wordingham: "Re: Other Question, Problem, or Feedback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > 1.Is it true that there are many ways of encoding the same character in
> > UTF-16?
>
> No. There is exactly one way of encoding each character in UTF-16. See
> TUS 4.0 Section 2.5 'Encoding Forms', especially p29.

I think this may be referring to the various normalized forms for strings.
For example, "e with an acute accent" could be <U+00E9> or it could be
<U+0065, U+0301>

Which CAN be a problem for regular expressions, unless they're designed with
this in mind. The simplest solution is to normalize the input strings to the
same form before doing matching (for example, .NET provides the
String.Normalize [http://msdn2.microsoft.com/en-us/ebza6ck1.aspx] method).

Dean.

Next message: J Andrew Lipscomb: "Re: unicode Digest V6 #126"
Previous message: Richard Wordingham: "Re: triple diacritic (sch with ligature tie in a German dialect writing document)"
In reply to: Richard Wordingham: "Re: Other Question, Problem, or Feedback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 12 2006 - 21:57:04 CDT