L2/12-283

L2/12-283

Source: Mark Davis
Subject: Handling fake Gershayim and Geresh in Hebrew words (UAX #29)
Date: 2012/07/29

Proposed Change

Create a PRI for the following proposed change to UAX #29 in 6.2.1.

Accommodate the use of " and ' in default Hebrew word break. The changes would consist of the following:

1. Create a property value for Hebrew_Letter (HLetter), for Single_Quote (SQuote), and Double_Quote (DQuote).

2. Add rules:

HLetter × SQuote
HLetter × DQuote HLetter
HLetter (SQuote | DQuote) × HLetter

3. Change every other rule as follows:

ALetter to be (ALetter | HLetter)
Mid_Num_Let to be (Mid_Num_Let | SQuote)

Background

When writing Hebrew, it is common practice to use ASCII " and ' instead of the correct characters. However, while those behave correctly in the default Unicode line break, they don't behave correctly in the default Unicode word break. The problem arises when there is Hebrew text in the midst of another language, so the other language's word break is being used.

There are pros and cons to this change. It is a very language-specific change, and we certainly don't want to push all the language-specific changes down to root. On the other hand, other than some minor additional complexity, it shouldn't hurt any other locale; the script makes this unambiguous. So we'd like a PRI item for this to consider whether or not the change would be warranted.

The problem arises in these two cases:

א"ב
א'

While the following case works fine already, and needs no change.

א'ב

The Geresh-equivalent (') can occur medially and finally, while the Gershayim-equivalent (") can occur only medially.