From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jan 03 2006 - 19:43:30 CST
> I would like the group's opinion on my proposal to add the U+05BE HEBREW
> PUNCTUATION MAQAF character to the Dash category. MAQAF is a Hebrew
> character similar to the HYPHEN, both in functionality and form. To give
> an English approximation of its function, it connects words together,
> whether to make a term out of two words (e.g. Tel־Aviv, Home־Owner) or
> to connect words which are joined when both are written in Hebrew (e.g.
> InHebrew vs. In־ENGLISH, assuming ENGLISH was written in Latin letters).
>
> I'm not sure why HEBREW PUNCTUATION MAQAF was introduced into Unicode
> the first place, as it seems to be equivalent to the HYPHEN.
Its existence in Windows Code Page 1255 0xCE = U+05BE HEBREW PUNCTUATION
MAQAF is both necessary and sufficient reason for it to have been
included separately in Unicode.
> Perhaps its
> due to the fact that it appears in traditional Hebrew texts,
with distinct shape from a Latin hyphen, which is another reason for
separate encoding. Overunification of punctuation that has consistently
different appearances in different script contexts is a potential
problem for rendering and font choice.
> whereas
> other modern Hebrew punctuation (such as COMMA and PERIOD) was borrowed
> from Latin in modern times. In modern Hebrew texts, MAQAF is often
> substituted by HYPHEN-MINUS or HYPHEN, as there's no MAQAF character
> on the Hebrew-Israeli keyboards.
>
> By adding MAQAF to the Dash category, aside from putting it where it
> belongs (in my opinion), we'll make the character folding rule of:
>
> pD -> HYPHEN-MINUS
>
> apply to it.
Not automatically, although this might be a good idea in general.
> This would be beneficial, as the Hebrew-Israeli keyboard
> doesn't have a key for MAQAF and therefore users cannot easily search for it.
>
> Comments?
There are two distinct potential changes you need to address.
Change #1:
General Category: gc=Po (current) --> gc=Pd
Relevant data file: UnicodeData.txt
Change #2:
Binary property Dash: False (current) --> True
Relevant data file: PropList.txt
I believe that all gc=Pd characters also have Dash=True, but the
inverse is not the case. There are Dash characters that have
gc=Po or gc=Sm. So it would be possible to change the Dash property
for MAQAF without changing the General Category for it -- and, in
fact, I suspect that would be a little easier to persuade the UTC
to do.
> How do I go about submitting such a proposal?
At this point, the most straightforward way to submit such a
proposal is to use the online contact form:
http://www.unicode.org/reporting.html
with the category, Public Review Issue, noting that this is
feedback for the Unicode 5.0.0 beta review:
http://www.unicode.org/versions/beta.html
State the issue succinctly and your proposal clearly, and make
reference to the exact properties I have cited above. That will
make it a lot easier for the UTC to consider and decide upon
the issue.
--Ken
This archive was generated by hypermail 2.1.5 : Tue Jan 03 2006 - 19:44:30 CST