Janusz S. Bien
jsbien at mimuw.edu.pl
Fri Sep 16 10:57:44 CDT 2016
Quote/Cytat - Eric Muller <eric.muller at efele.net> (pią, 16 wrz 2016,
> On 9/16/2016 8:30 AM, Janusz S. Bien wrote:
>> Quote/Cytat - Eric Muller <eric.muller at efele.net> (pią, 16 wrz
>> 2016, 17:03:54):
>>> On 9/16/2016 6:52 AM, Janusz S. Bień wrote:
>>>> (when working on a corpus of historical Polish we
>>>> noticed some cases where standard Unicode equivalence was not
>>> I'm very interested to know more about those cases.
>> For our search engine we were unable to use compatibility
>> equivalence "out of the box" for splitting the ligature because it
>> also converted long s to short s while we wanted to preserve the
> I am interested in the problems with *canonical* equivalence. I
> thought that you were talking about those before.
I apologize for the confusion, that was my fault. I tend to answer too
quickly and not precisely enough :-(
On the other hand I'm not sure canonical equivalence is always what I
want and expect, but I don't have specific examples at hand.
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
More information about the Unicode