RE: Korean line breaking rules : Unicode 3.0 (p. 124)

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Wed Mar 22 2000 - 22:26:46 EST


On Tue, 14 Mar 2000, Kenneth Whistler wrote:

> Erland Sommarskog suggested:
>
> >
> > Rick McGowan <rmcgowan@apple.com> writes:
> > > I think that unfortunately both Hoon Kim and Jungshik Shin I think have
> > > *entirely* mis-interpreted the text. The text says:
> > >
>
> kenw inserts here the correct, exact citation of the text on p. 124:
>
> "In Korean, for example, lines may be broken either at spaces (as in Latin
> text) or on ideograph boundaries (as in Chinese)."
>
> > >
> > > The word "or" on the second line would never be interpreted as an "exclusive
> > > or", it is an "inclusive or". In "C Language" syntax, it means "A|B"; it
> > > does not mean "A^B".
> > >
> > > In that light, some of their previous comments should probably be re-examined.

> This is quite possibly the source of the misinterpretation, and should
> be taken under advisement by the editors to clarify the next edition.

  I guess the best course of action is take out completely the example
about Korean line breaking in next edition(and come up with a much more
suitable example) for the reason I wrote in my message dated March 2nd
and I'm gonna repeat below.

U3.0> In particular, word, line, and sentence boundaries will need to
U3.0> be customized according to locale and user preference. In Korean,
U3.0> for example, lines may be broken either at spaces(as in Latin
U3.0> text) or U3.0 on ideographic boundaries (as in Chinese).

  The cause of the "misinterpretation" comes from the the sentence
about 'user preference' before Korean line breaking example. If Korean
is taken as an example of different sets of rules applied diepending on
'user preference', that implies there are two differerent SETS of rules in
Korean line breaking which can be offered for end users to choose from.
That I'm disputing here(that could not have been the intention of the
author, but implication is there! Widely used web browsers use a set of
rules which doesn't include rule 1) below leading to very unsatisfactory
rendering of Korean text). There's only one SET of line breaking rules
for Korean text and that is :

  1) can be broken at any syllabic boundaries (pls, don't use ideographic
    boundaries talking about Korean text alone. By far the majority of printed
    materials in Korean do NOT have a single ideogram) : this most
    important rule is NOT implemented by two leading web browsers in
    the market. (see again
       <http://photon.hgs.yale.edu/~jungshik/lb.html> )

  2) can be broken at space(this arguablely is included in rule 1)

  3) Do not end lines with a certain set of punctuation marks
     ; opening single/double quotation marks, opening
       brace/braket/parenthesis....

  4) Do not begin lines with a certain set of punctuation marks
     : cloing single/double quotation marks, closing
       brace/braket/parenthesis, question mark, exclamation mark,
       semicolon, colon, period, comma....

 There being only one SET of rules, there's little room for
user preference and Korean line breaking is by no means a good example of
different sets of line breaking rules being adopted per user preference,
which is why I want the example of Korean line breaking to be taken out
completely in next edition. As for depending on 'locale' part, Korean
line breaking can be as good/bad as any other line breaking rules and
there's no reason to pick Korean example solely on that basis.

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT