From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jul 23 2007 - 22:51:57 CDT
Asmus Freytag wrote:
> Rest assured, the WJ would be quite incorrect. The fact that you keep
> repeating this indicates that you did not read the standard or any of my
> other posts.
Rest assured that I read the standard and did not find any rationale about
the use orsemantics of WJ compared to ZWNJ which was introduced only much
later to replace the deprecated ZWNBSP used now as a BOM.
I absolutely don't care about the linguisitic definitions of "syllables"
bercause this cannot be treated at the encoding or local orthographic level
without the help of some language-specific dictionary. These linguisitic
syllables are NOT a property of the script with which these languages are
written, and Unicode does not encode the languages, so it cannot treat them.
However It's up to Unicode to define the way a script can be encoded to
specify essential things like the prohibition or preference of ligatures, or
the prohibition or suggested "syllable" breaks.
Yes, English lacks a correct word for saying "syllable breaks", i.e. the
fact that some places in a word can be used to split it to separate lines,
possibly also adding some visible mark when this occurs. What I really mean
by "syllable break" in ALL what I have written since now is what is meant by
the much more precise French term "césure".
You tried to use the terms "word breaking" but this term seems wrong too for
this usage: for me word breaking is the fact of splitting a text into
separate words, not the fact of finding possible breaks within a word.
All your misunderstanding of what I meant (suggesting that I wanted to
redefine things, which I am not) is caused by the misunderstanding of the
English expression "syllable break". Read it as the French term "césure",
which is much better than "syllable break" (even though no "césure" can
occur in the middle of a linguistic syllable in French).
And yes I know that a césure is *preferably* not used in every places (but
absolutely NOT forbidden), for stylistic reasons (in French it is preferable
to not insert a césure after the prefixes "con-", "cul-",... or in the
middle of "coha-bite" for the same reasons that it would be read as
offensive.
I say "preferably", because there are frequent cases where this use is
wanted by authors, notably in poestry and the texts of songs (where the
césure is made audible by the rhythm or the melody), but also for the most
vernacular use. Look at the French article about "césure" in Wikipédia,
you'll find some external references about these funny césures used
purposely in songs; the most wellknown cases in France being those from
Serge Gainsbourg who was known to have an excellent mastership of the
correct French language (despite his language was perceived as crude and
shocking in the 1960's). I'm sure that such authors also exist in other
international cultures, and that playing with the too strict commonly
admitted language rules is wanted in every cultures, that don't want to
restrict the language only to formal uses.
So even if a language will preferably be not rendered with these generally
undesired césures, or will preferably not leave a short syllable alone with
just one or two letters for typographic reasons, these considerations are
NOT considered incorrect for the language itself, where preferences of style
is left as a choice by the author.
Now let's get back to Unicode and text encodings: how can an author specify
simultaneously in the text where ligatures can or cannot occur, and where
césures can or cannot occur? And if it occurs, how must the cesures be
presented (standard hyphenation with a hyphen mark at end of the first line,
as implied by SHY, is not the only option, and even Latin-written languages
have other requirements about how a césure should be presented).
This archive was generated by hypermail 2.1.5 : Mon Jul 23 2007 - 22:54:06 CDT