Re: Encoding of personal names in official databases

From: Henning Brunzel (hbrunzel@meta-systems.de)
Date: Tue Mar 30 1999 - 07:06:28 EST


Trond Trosterud wrote:
>
> Within the next month, I am going to write a memo to the Norwegian dept. of
> justice to comment upon the planned revision of the Norwegian laws for
> personal names. The goal of the revision is to allow other naming practices
> than the Norwegian one, due to a culturally more heterogenous population.
>
> My input will deal with the encoding of the names.
>
> Today, the official Norwegian population registry is coded with ascii,
> enriched with the norewegian letters ÆØÅæøå on the ascii positions [\]{|}
> (I guess the same solution is in use in Denmark, Sweden and Finland as
> well, but with äö for æø).
>
> My suggestion will be that they abandon their 7-bit systems and move to...
>
> and here I need your advice.
>
> In Norway, Sámi citizens use Sámi names, the diacritics (ACUTE ACCENT,
> CARON, HOOK, STROKE) are just stripped off in the registry. We have large
> amounts of Finns and Swedes, their äö are replaced with æø. Immigrants from
> other countries bring their letters (and alphabets) with them. A natural
> answer to this is of course: Use the UCS. But the bases are huge: Every
> single citizen is iincluded.
>
> Do anyone on this list have experiences with similar cases? What is being
> done around the world? Do other countries use 7-bit solutions as well? Are
> there plans to migrate to 8 bits? 16 bits?
>
> Since we need both the Sámi names and the names of new immigrants, 8 bits
> really are not enough. If we then use some UCS format, which one shall we
> use (16-bit, utf-8,... , in order to save space and have databases with
> fast retrieval?

This depends mainly on the number of immigrants with exotic names. With
exotic I mean using letters with Unicode Positions above U+0800. If
there were really many of them UCS-2/UTF-16 might be your choice. But I
think that you will mainly have to handle Norwegian names, so my
suggestion is: `Use UTF-8'

But: Don't trust me - I am just a Unicode hobbyist.

> Greetings,
>
> -------------------------------------------------------------------
> Trond Trosterud t +47 7764 4763
> Lingvistisk institutt, Det humanistiske fakultet h +47 7767 3639
> N-9037 Universitetet i Tromsø, Noreg f +47 7764 4239
> Trond.Trosterud@hum.uit.no http://www2.isl.uit.no/trond/index.html
> Test string-please ignore:á?~¹s¼¿-Á,?¸Sº¾-â¡¥³?^-Â?¢²??-æøåäö-ÆØÅÄÖ
> -------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT