Korean line breaking rules : Unicode 3.0 (p. 124)

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Wed Mar 01 2000 - 22:29:45 EST


On Sun, 13 Feb 2000, Kenneth Whistler wrote:

> Lest anyone feel unduly constrained, let me note that now that
> the editorial committee has closed the book, so to speak, on Unicode 3.0,
> all of you who are about to open the book for the first time should
> feel free to unleash your commentary on the text.

   I've just received my copy of Unicode 3.0 book, here goes
my first commentary.

   On page 124(section 5.15 Locatiing Text element boundaries),
the third paragraph has the following around the end:

U3.0> In particular, word, line, and sentence boundaries will need to
U3.0> be customized according to locale and user preference. In Korean,
U3.0> for example, lines may be broken either at spaces(as in Latin text) or
U3.0> on ideographic boundaries (as in Chinese).

  First of all, it's a great mystery to me how on earth this
strange notion of Korean having *two* different line breaking rules(as
opposed to one) crept into the expertise of non-Korean experts on Korean
and finally made it into Unicode 3.0 book and Unicode TR on line breaking.

  None of tens of Korean books on my bookshelves
I've just gone through breaks lines *exclusively* at spaces. All of them
break lines freely at *syllables*. Only places where lines are broken
*exclusively* at spaces(for Korean text) I can think of are completely
*broken*(as far as Korean line breaking is concerned) web browsers like
Netscape and MS IE and possibly earlier implementations of Korean LaTeX.
One may add to the list Korean text formatted by non-localized version
of 'fmt' (in Unix) as another example. To work around the problem caused
by these broken web browsers, some Korean web authors apply a simple
filter to insert <wbr> between every pair of Korean syllables to their
html files. To see what I mean, you may wanna take a look at
<http://photon.hgs.yale.edu/~jungshik/lb.html> and
<http://photon.hgs.yale.edu/~jungshik/lbscreenshot.jpg>

  Let me emphasize that line can be broken at any syllable boundaries
in Korean text (except for some obvious exceptions as applied in English
text: i.e. punctuation marks like '!', '?' cannot begin a line).

  Secondly, even in Latin scripts(well, at least in English) lines can
be broken not only at spaces but also at syllables(syllabic boundaries)
with hyphen. Only difference between Korean line breaking and English
line breaking is Korean doesn't need hyphen when lines are broken at
syllables because in Korean syllables form another visual unit a level
higher than alphabetic/phonetic letters(consonants and vowels).

  Thirdly, the expression 'ideographic boundaries' is not appropriate
at all when describing Korean line breaking rules. More appropriate is
'syllabic boundaries' or 'syllables'.

  Given these, I'd like to suggest the last sentence(that begins with
'In Korean, for instance...') be removed in the future edition because
Korean is NOT a good example case where there can be multiple line
breaking rules depending on user preference.

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT