L2/06-157

Comments on Public Review Issues
(January 30, 2006 - May 12, 2006)

The sections below contain comments received on the open Public Review Issues as of May 12, 2006, since the previous cumulative document was issued prior to UTC #106 (February 2005).

Contents:

75 Proposed Update UTR #25, Unicode Support for Mathematics
81 Proposed Update to UAX #34: Unicode Named Character Sequences
83 Changing Glyph for U+047C/U+047D Cyrillic Omega with Titlo
84 Proposed Update to UAX #29: Text Boundaries
85 Proposed Update to UAX #31: Identifier and Pattern Syntax
86 Proposed Update to UAX #15: Unicode Normalization Forms
87 Proposed Update to UAX #24: Script Names
88 Proposed Update to UAX #14: Line Breaking Properties
89 Proposed Update to UTR #23: Unicode Character Property Model
90 Unicode 5.0 Beta 2
91 Proposed Update to UAX #9: The Bidirectional Algorithm
92 Proposed Draft UTS #40: BOCU-1 MIME-Compatible Unicode Compression
93 Representation of Malayalam /au/ Vowel in Traditional and Reformed Orthography
94 Proposed Update to UTS #10: Unicode Collation Algorithm
Other Items: DUTR #30 Character Foldings


75 Proposed Update UTR #25, Unicode Support for Mathematics

No feedback was received via the reporting form this period.

81 Proposed Update to UAX #34: Unicode Named Character Sequences

Date/Time: Wed Apr 12 11:59:14 CST 2006
Contact: Antoine10646@Leca-Marti.org
Name: Antoine Leca
Report Type: Public Review Issue
Opt Subject: missing NamedSequence

I am reading NamedSequencesProv-5.0.0d2.txt, dated 2006-02-16.

I am expecting to find some more entries there:

DEVANAGARI CONJUNCT KSSA;0915 094D 0937
DEVANAGARI CONJUNCT JNYA;091C 094D 091E
DEVANAGARI CONJUNCT SHRA;0936 094D 0930

; need to "invent" some name here
BENGALI LETTER ??? A;0985 09CD 09AF 09BE
BENGALI LETTER ??? E;098F 09CD 09AF 09BE
ORIYA LETTER ??? A;0B05 0B4D 0B2F 0B3E
ORIYA LETTER ??? E;0B0F 0B4D 0B2F 0B3E

MALAYALAM VOWEL SIGN SAMVRTHOKARAM;0D41 0D4D
;  (also SHORT U, also SAMVRUTHOKARAM)

MYANMAR LETTER AI:1021 1032
MYANMAR LETTER UI:1021 102F 102D
MYANMAR VOWEL SIGN O;1031 102C
MYANMAR VOWEL SIGN AU;1031 102C 1039 200C
MYANMAR VOWEL SIGN UI;1031 102F 102D

Several other combinations in Myanmar are dependant on the decisions about N3043.

It would be worthwhile to extract the substance from PRI#37, and to add some significant sequences based on the decisions taken after this review, too. Like

DEVANAGARI SECONDARY RA SIGN;200D 094D 0930 etc.

Antoine

83 Changing Glyph for U+047C/U+047D Cyrillic Omega with Titlo

No feedback was received via the reporting form this period.

84 Proposed Update to UAX #29: Text Boundaries

No feedback was received via the reporting form this period.

85 Proposed Update to UAX #31: Identifier and Pattern Syntax

No feedback was received via the reporting form this period.

86 Proposed Update to UAX #15: Unicode Normalization Forms

Date/Time: Tue May 2 09:45:53 CST 2006
Contact: fsasaki@w3.org
Name: Felix Sasaki
Report Type: Public Review Issue
Opt Subject: Comment on Issue "86 Proposed Update to UAX #15: Unicode Normalization Forms"

These comments on Issue "86 Proposed Update to UAX #15: Unicode Normalization Forms" are sent on behalf of the W3C i18n core working group. The comments were made by Martin Duerst, who is a member of the working group, and approved by the working group. Regards, Felix Sasaki.

Below the comments from Martin:

I think this addition is in general a good one. It came up from a discussion started by John Klensin, who is trying to define "Network UTF-8" for the IETF. This will most probably include NFC and CRLF as line endings. But as explained below, NFC in general can create denial of service attacks (buffer overflow/running out of memory) in very special and practically irrelevant cases.

In terms of details, I have the following comments:

- The intro says "where people would like to make use". This should be reworded to make clear that this is a technical requirement in some (actually fairly common) situtations.

- D5, Bounded Reordering Format, should clearly say whether this includes normalization or not, i.e. whether this has to be in NFC (or NFKC, see below) or whether the only condition is that there are no more than 30 non-starters in sequence. Later text indicates that this does not include any actual normalization. This is helpful in terms of definitions (because otherwise we would need several such definitions, one each for each NF), but is probably not what users of the document want: People won't implement the procedure described in D6 without also doing normalization at the same time.

- The solution presented here is to insert CGJ. Given that non-bounded forms are not really supposed to exist (the main reason I can think for them to exist is a DOS attack, the second reason is some test data for testing input and output of the stuff we are defining here), I think that inserting CGJ is overkill. Just dropping superfluous non-starters would be way enough. The main concern I have with inserting CGJs is not even so much the complexity of the insertion algorithm (which is not much more complex than removing stuff), but the fact that some people might want to try and detect sequences where there is a CGJ after every 30 or so non-starters, and remove the CGJs again. Defining that we drop superfluous non- starters sends a clear signal that combining marks are not an infinite extensibility point. Also, years ago, we introduced some complexity into NFC that we (in particular Mark) later regretted; I'd prefer to try this time to keep things as simple as possible.

- I'm not totally happy with the fact that different canonically equivalent strings will end up differently when first passed through the Bounded Reordering Process and then normalized. The reason why this happens is that the count to 30 happens on the decomposition, but the CGJs get inserted between (potentially) unnormalized characters. I think that implementation is way easier (and way easier to describe) this way, at least for processing separate from actual normalization (which I don't think will occur much in practice). But this is an additional argument for why I think simpler would be better.

Regards, Martin.

Date/Time: Wed May 10 08:34:51 CDT 2006
Contact:
Name: SADAHIRO Tomoyuki
Report Type: Public Review Issue
Opt Subject: PRI#86: Forbidding Characters

Is that alternative approach regarded as conformant?

Conformant implementation must pass the conformance test (UAX15-C3).

The conformance test includes the corrected decomposition. Thus there is no room to allow the alternative approach being conformant.

I does not assume the possibility of more than one results from one input to be good idea. (The forbidding must be one result too.)

If it does not intended that the alternative approach would be conformant, such a matter should not be mentioned in the section of Versioning and Stability.

87 Proposed Update to UAX #24: Script Names

No feedback was received via the reporting form this period.

88 Proposed Update to UAX #14: Line Breaking Properties

Date/Time: Mon Feb 6 13:16:04 CST 2006
Contact: kent.karlsson14@comhem.se
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: UAX 14, Line breaking properties

I do not like the change to use SA for some characters previously categorised as CM. They should stay CM. Even though line break may be allowed between (certain) SA characters, it is never allowed between SA and CM.

In principle, AL and SA are the same. Note that AL+CM sequences may need to be hyphenated, in the same way (and using exactly the same mechanism, but other data) as SA+CM sequences may need be line broken/hyphenated.

Exactly what is tailorable? The line break behaviour of the line break classes, or the line break property assignments for characters (which is much more fine grain), as well as their behaviour. In the latter case, can one new line break classes? And can one tailor a character to have one of the non-tailorable line break classes?

89 Proposed Update to UTR #23: Unicode Character Property Model

No feedback was received via the reporting form this period.

90 Unicode 5.0 Beta 2

Date/Time: Tue Feb 7 04:03:36 CST 2006
Contact: andrewcwest@gmail.com
Name: Andrew West
Report Type: Other Question, Problem, or Feedback
Opt Subject: Review of 5.0 Code Charts

LAO

0E97 LAO LETTER THO TAM = tho nok

alias should be :

= tho thung

PHAGS-PA

A865 PHAGS-PA LETTER GGA * Mongolian, Uighur * created by reversal of A862

This letter is given in 13th century tables of the original 41 Phags-pa letters, but it is not attested in any text for any language. It was probably used for some other language for which no Phags-pa texts have survived (my guess is that it was intended to represent the latter 'ayn in Persian, but no Persian Phags-pa texts are known).

My original annotation strategy was to simply leave the language unmarked, but as the code chart states that "Phags-pa letters are used for Mongolian, Chinese, Uighur, Tibetan, and Sanskrit unless annotated with a more restricted list of languages", leaving the language unmarked would imply that A865 was used for all five of these languages, when in fact it is used for none of them. I suggest replacing the erroneous "Mongolian, Uighur" note with something like:

* not used in Mongolian, Chinese, Uighur, Tibetan or Sanskrit

or

* language usage unknown

A870 PHAGS-PA LETTER ASPIRATED FA * Chinese x (phags-pa letter ha - A85C)

cross-reference should be :

x (phags-pa letter fa - A864)

Date/Time: Wed Mar 15 20:06:44 CST 2006
Contact: typhlosion@gmail.com
Name: Benjamin Scarborough
Report Type: Other Question, Problem, or Feedback
Opt Subject: Proposed Additional Name Aliases (BETA FEEDBACK)

After reviewing NamesList-5.0.0d4.txt, as well as other sources on name errata, I propose the following additions to NameAliases.txt:

0195;LATIN SMALL LETTER HWAIR
01A2;LATIN CAPITAL LETTER GHA
01A3;LATIN SMALL LETTER GHA
0C4D;TELUGU SIGN HALANT
0CCD;KANNADA SIGN HALANT
0CDE;KANNADA LETTER LLLA
0D4D;MALAYALAM SIGN CHANDRAKKALA
178E;KHMER LETTER NNA
179E;KHMER LETTER SSA
17C8;KHMER SIGN YUKALEAKPINTU
17C9;KHMER SIGN MUUSEKATOAN
17CA;KHMER SIGN TREISAP
17CB;KHMER SIGN BANTAK
17D6;KHMER SIGN CAMNOC PII KUUH
17D9;KHMER SIGN PHNEK MOAN
17DA;KHMER SIGN KOOMOOT
17DC;KHMER SIGN AVAKRAHA SANNYA
3021;SUZHOU NUMERAL ONE
3022;SUZHOU NUMERAL TWO
3023;SUZHOU NUMERAL THREE
3024;SUZHOU NUMERAL FOUR
3025;SUZHOU NUMERAL FIVE
3026;SUZHOU NUMERAL SIX
3027;SUZHOU NUMERAL SEVEN
3028;SUZHOU NUMERAL EIGHT
3029;SUZHOU NUMERAL NINE
3038;SUZHOU NUMERAL TEN
3039;SUZHOU NUMERAL TWENTY
303A;SUZHOU NUMERAL THIRTY
A015;YI SYLLABLE ITERATION MARK

Date/Time: Thu Apr 13 01:26:52 CST 2006
Contact: verdy_p@wanadoo.fr
Name: Philippe Verdy
Report Type: Public Review Issue
Opt Subject: Unicode 5.0 decompositions for Balinese vowel signs with tedung

I note that the Unicode 5.0 BETA charts (d1) (as well as the current draft for the main UCD file) do not indicate any canonical decomposition for composite vowel signs, like they exist in Devanagari, especially for those that have a right-hand part. Won't that cause difficulties in implementations?

Shouldn't they be given canonical equivalents (also reflected also in their balinese names ?) This would also avoid two possible confusable encodings (for IDN or other similar apps), given that they will be rendered the same, and may be understood idetically by people, including when composing texts on keyboards where these complex signs may be entered in a decomposed way, instead ofa single keystroke for the composite ; this would probably simplify the design of Balinese keyboards for those long vowels, given that this would avoid requiring more separate positions only for them, and the fact they could be entered equivalently in a decomposed way using only "short" vowels. An advanced keyboard or an editor may recompose them on the fly using simply the canonical decompositions

This concerns the following five vowel signs:

* U+1B3B = <U+1B3A ; U+1B35>
  BALINESE VOWEL SIGN RA REPA TEDUNG (vocalic rr) =
  BALINESE VOWEL SIGN RA REPA (vocalic r) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B3D = <U+1B3C ; U+1B35>
  BALINESE VOWEL SIGN LA LENGA TEDUNG (vocalic ll) =
  BALINESE VOWEL SIGN LA LENGA (vocalic l) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B40 = <U+1B3E ; U+1B35>
  BALINESE VOWEL SIGN TALING TEDUNG (o) =
  BALINESE VOWEL SIGN TALING (e) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B41 = <U+1B3E ; U+1B35>
  BALINESE VOWEL SIGN TALING REPA TEDUNG (au) =
  BALINESE VOWEL SIGN TALING REPA (ai) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B43 = <U+1B42 ; U+1B35>
  BALINESE VOWEL SIGN TALING PEPET TEDUNG (oe) =
  BALINESE VOWEL SIGN TALING PEPET (ae) +
  BALINESE VOWEL SIGN TEDUNG (aa)

Philippe.

Date/Time: Thu Apr 13 01:43:33 CST 2006
Contact: verdy_p@wanadoo.fr
Name: Philippe Verdy
Report Type: Error Report
Opt Subject: Unicode 5.0 decompositions for Balinese independant vowels with tedung

Similar question for independant vowels with tedung:

* U+1B06 = <U+1B05 ; U+1B35>
  BALINESE LETTER AKARA TEDUNG (aa) =
  BALINESE LETTER AKARA (a) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B08 = <U+1B07 ; U+1B35>
  BALINESE LETTER IKARA TEDUNG (ii) =
  BALINESE LETTER IKARA (i) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B0A = <U+1B09 ; U+1B35>
  BALINESE LETTER IKARA TEDUNG (uu) =
  BALINESE LETTER IKARA (u) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B0C = <U+1B0B ; U+1B35>
  BALINESE LETTER RA REPA TEDUNG (vocalic rr) =
  BALINESE LETTER RA REPA (vocalic r) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B0E = <U+1B0D ; U+1B35>
  BALINESE LETTER LA LENGA TEDUNG (vocalic ll) =
  BALINESE LETTER LA LENGA (vocalic l) +
  BALINESE VOWEL SIGN TEDUNG (aa)
* U+1B12 = <U+1B11 ; U+1B35>
  BALINESE LETTER OKARA TEDUNG (au) =
  BALINESE LETTER OKARA (o) +
  BALINESE VOWEL SIGN TEDUNG (aa)

Note the variant glyph for the tedung on U+1B0E (vocalic ll), attached at a lower point, as shown in the beta chart. Is the chart correct, or is the normal tedung glyph possible also for what could be only a recommanded and common, but optional ligature?

Date/Time: Fri Apr 14 15:23:55 CST 2006
Contact: samuel.thibault@labri.fr
Name: Samuel Thibault
Report Type: Error Report
Opt Subject: U+2800 (braille pattern blank) should be a separator and space

Hi,

In the unicode standard, U+2800 is a "symbol, other" character. However, as discussed on the unicode mailing list [1,2,3], it should rather be a "separator, space" character.

Regards, Samuel

[1]: http://www.unicode.org/mail-arch/unicode-ml/y2006-m03/0283.html

[2]: http://www.unicode.org/mail-arch/unicode-ml/y2006-m03/0284.html

[3]: http://www.unicode.org/mail-arch/unicode-ml/y2006-m03/0285.html

Date/Time: Sun Apr 23 12:50:20 CST 2006
Contact: karl-pentzlin@acssoft.de
Name: Karl Pentzlin
Report Type: Public Review Issue
Opt Subject: Typo in Unicode Charts.pdf 5.0 Beta from 2006-01-31

The file Unicode Charts.pdf 5.0 Beta from 2006-01-31 shows on page 438 for U+01B3 a glyph for LATIN CAPITAL LETTER Y WITH HOOK a glyph with a right hook (changed from Unicode 4.0 where the reference glyph had a left hook). This is followed by the text "a glyph variant with the hook on the right also occurs". I presume that here the word "right" is to be replaced by "left" (to denote the alternative possibility deviating from the actual reference glyph).

91 Proposed Update to UAX #9: The Bidirectional Algorithm

Date/Time: Sun Feb 26 08:39:57 CST 2006
Contact: matial@il.ibm.com
Name: Matitiahu Allouche
Report Type: Technical Report or Tech Note issues
Opt Subject: Comments about UAX#9 - tr9-16

1) It seems to me that the date 2005-02-24 is way too old. Maybe the year was meant to be 2006?

2) The revision number is 16, but I have on my computer a version 16 dated 2005-12-07. I suggest 17 for a change. Don't forget to change the revision number also in the "Modifications" section.

3) In section 3.4, "these rules acts on a per-line basis, and is applied": please match the number of subject and verbs.

Date/Time: Tue May 9 16:20:21 CDT 2006
Contact: matial@il.ibm.com
Name: Matitiahu Allouche
Report Type: Technical Report or Tech Note issues
Opt Subject: UAX#9 corrigendum

Remove the duplicate in "the the" within the first paragraph of section 4 "Bidirectional Conformance". Earth shattering, isn't it?

92 Proposed Draft UTS #40: BOCU-1 MIME-Compatible Unicode Compression

Date/Time: Mon Mar 20 01:44:44 CST 2006
Contact: dewell@adelphia.net
Name: Doug Ewell
Report Type: Technical Report or Tech Note issues
Opt Subject: Re: Proposed Draft UTS #40: BOCU-1

With regard to the proposal to elevate the BOCU-1 specification to a Unicode Technical standard, I would like UTC to clarify that it is *not* necessary for an implementer or vendor to procure a license of any type from IBM in order to implement a BOCU-1 encoder and/or decoder conformantly and without violating any intellectual property constraints.

In my opinion, it would not be suitable for UTC to classify BOCU-1 as a Unicode Technical Standard, on the same level as SCSU (UTS #6) and on a higher level than UTF-EBCDIC (UTR #16), if it imposes any type of proprietary licensing restrictions. The UTC should note that no license is required to implement SCSU or UTF-EBCDIC, nor for that matter any of the standard Unicode encoding schemes, and that if such a license were required for BOCU-1, it would dramatically hinder the widespread adoption of BOCU-1 regardless of any technical superiority over other methods of compressing Unicode data.

-- Doug Ewell Fullerton, California, USA
http://users.adelphia.net/~dewell/
Author, UTN #14, "A Survey of Unicode Compression"

93 Representation of Malayalam /au/ Vowel in Traditional and Reformed Orthography

Date/Time: Tue Apr 25 00:45:09 CST 2006
Contact: jameskass@att.net
Name: James Kass
Report Type: Public Review Issue
Opt Subject: pr-93

Quoting from Option B of pr-93:

> Any such text would require a distinct font 
> to display a one-part sign  for the modern 
> orthographic appearance. Thus the same spelling 
> would be used for both orthographies, requiring 
> a change in font to display text in one style or 
> the other.

At least one of the advanced font technologies, OpenType, offers the promise of being able to represent both traditional and reformed Malayalam with a single font when a higher level protocol is used and the text runs are tagged appropriately.

Date/Time: Tue Apr 25 02:49:26 CST 2006
Contact: jonathan_kew@sil.org
Name: Jonathan Kew
Report Type: Public Review Issue
Opt Subject: Feedback on PRI #93

To me, it seems clear that Option A is the appropriate recommendation. The reform of Malayalam orthography involves a change to the conventions used in mapping between the syllables and phonemes of the spoken language, and the written signs of the script. That's what orthographic conventions are all about! If they are reformed, we must expect changes in the encoded representation of a given fragment of language.

Orthographic reform may involve changes in how the written signs are used; it may involve adding or removing signs from the repertoire used; or in extreme cases, it could result in a new script. It is not realistic to demand that the "spelling" of words (in the sense of the exact sequence of coded characters) will remain unchanged during orthographic reform.

From a different point of view, that of "letters" as perceived by the users of an orthography, it is possible that "spelling" is unchanged; this appears to be the case with the reform of Malayalam AU, as users consider Traditional AU and Reformed AU to be "the same letter". But Unicode does not directly encode "letters of the alphabet", as understood by the users of a particular writing system for a particular language; it encodes the written signs that make up scripts, providing a repertoire that may be used in many different ways according to the conventions of particular orthographies. And at the level of the written signs that Unicode encodes, there is a clear distinction between the old and new ways of writing AU in Malayalam.

(Otherwise, one might similarly argue that "traditional" and "modern" English orthographies should be able to use the same "spelling" -- sequence of coded characters -- despite their different conventions for rendering the letter "s" in non-final position. The old "long s" is clearly at some level "the same letter" as the modern "s". But it is a different written sign, and is encoded separately, and so this reform in English orthography leads to a change in the encoded representation of the words, even though users may consider that the "spelling" has not changed.)

A further consideration is the canonical equivalence of U+0D4C with <U+0D46, U+0D57>, as mentioned in the background document. It seems to me that for practical purposes, this clinches the matter. If we recommend that "modern" fonts display U+0D4C as a single-part vowel identical to U+0D57 (option B), then we must also demand that in a sequence <cons, U+0D46, U+0D57>, they hide the U+04D6 completely. This would be a bizarre situation for users.

In addition, under Option B, users of Reformed fonts would be unable to distinguish between U+0D4C and U+0D57; yet these are not (and can never be) canonically equivalent. This is a recipe for inconsistently-encoded data.

Option A, then, is the preferable approach.

Date/Time: Wed Apr 26 04:59:50 CST 2006
Contact: naa.ganesan@gmail.com
Name: Naga Ganesan
Report Type: Public Review Issue
Opt Subject: Representation of Malayalam /au/ Vowel in Traditional and Reformed Orthography

Option B.

There's no spelling change. au vowel sign can be represented both ways in Malayalam.

Don't we already do this by changing the font between old conjuncts Malayalam (research purposes) and official reformed orthography?

N. Ganesan

Date/Time: Mon May 8 06:22:52 CDT 2006
Contact: jeroen@bohol.ph
Name: Jeroen Hellingman
Report Type: Public Review Issue
Opt Subject: Public Review Issue #93: Representation of Malayalam /au/ Vowel in Traditional and Modern Orthography

As developer of Malayalam-TeX, I feel qualified to provide some input, but am not able to resolve the issue provided.

My personal preference is to go for option B, but this is complicated by the canonical decomposistion of vowel sign au (vs-au), which will result in the modern vs-au to revert to an traditional vs-au if a user applies canonical decomposition, unless all rendering engines will also be made aware of this distinction, and start rendering this sequence as a modern vs-au as well.

Option A may require updates to existing Malayalam texts in Unicode to effectuate the modern orthography. I advise to look at current practice in Malayalam, and follow current practice. To use traditional orthography, a different font will be required anyway.

Note that the distinction between traditional and reformed Malayalam script is considerable. There is probably no Unicode implementation of traditional Malayalam script at all. A large number of conjuncts have been regularized (including, for example, following ra and the vowel signs -u, -uu, and -ri), and a number of rules involving compounds with initial ra have been abolished. Furthermore, the current (4.1) Malayalam Unicode code range lacks a number of characters used in traditional Malayalam, i.e., the vowel signes for vocalic l, vocalic rr, and vocalic ll. Of these characters, I can provide examples of usage from Gunderts Malayalam dictionary (admittedly extremely rare usage).

To make this decision, I think it is most important to do what is best for current users and implementations, which is probably B, and deal with specialist issues for traditional script later.

Jeroen Hellingman

Date/Time: Mon May 8 07:19:35 CDT 2006
Contact: bibincv_mec@yahoo.com
Name: Bibin
Report Type: Public Review Issue
Opt Subject: Malayalam vowel sign 'au'

Malayalam vowel sign 'au' is the symbol shown in 0D57(unicode). Even in books printed in old orthography only this is used now-a-days. I doubt whether there is any need for switching of spelling between new and old orthographies

Date/Time: Mon May 8 07:25:26 CDT 2006
Contact: praveen_v556@yahoo.com
Name: Praveen.V
Report Type: Public Review Issue
Opt Subject: Malayalam Vowel "AU"

I will go for option "A".But my thinking is that the symbol in OD57 can be used for old and new orthographies alike

Date/Time: Mon May 8 07:34:05 CDT 2006
Contact: nishavngpl@yahoo.co.in
Name:
Report Type: Public Review Issue
Opt Subject: Malayalam Vowel Sign 'au'

I think most suitable code for malayalam vowel is 0D57 so am moving to option 'A'. The symbol shown in OD57 can be used for old and new orthographices alike.

Date/Time: Mon May 8 07:34:40 CDT 2006
Contact: roopak_sudhakar@yahoo.com
Name: Roopak
Report Type: Public Review Issue
Opt Subject: Malayalam vowel sign 'au'

Malayalam vowel sign 'au' is the symbol shown in 0D57 and can be used for old and new orthographies alike. I will recommend option A. I think the glyph shown in the option A is the correct one. Under this option, both orthographies can be represented in the same font, but the spelling of /au/ would change when switching between orthographies.

Date/Time: Mon May 8 07:50:48 CDT 2006
Contact: rakesh_revathy@yahoo.com
Name: Rakesh G.S
Report Type: Public Review Issue
Opt Subject: Malayalam vowel sign "au"

I will go for option 'A'.I think the symbol OD57 can be used for old and new orthographies alike.

Date/Time: Mon May 8 07:52:19 CDT 2006
Contact: ullasb007@yahoo.com
Name: Ullas
Report Type: Public Review Issue
Opt Subject: Malayalam Vowel Sign "AU"

the glyph shown in 0D57 in the correct one for malayalam vowel sign 'AU'. it is time that the two parts symbol was used earlier. But the left part disappeared as a part of evalution of the script.

Date/Time: Mon May 8 07:52:36 CDT 2006
Contact: abdulshamnar@yahoo.co.in
Name: shamnar
Report Type: Public Review Issue
Opt Subject: Malayalam vowel sign "au"

I think the most suitable one is 0D57. So i will go for it.

Date/Time: Mon May 8 07:53:52 CDT 2006
Contact: balasankar.p@gmail.com
Name: balasankar
Report Type: Error Report
Opt Subject:

The vowel /au/ in Malayalam is represented differently in the traditional and reformed orthographies.

i think option A is more suitable since in new orthogrophy we use this option.

Date/Time: Mon May 8 07:54:23 CDT 2006
Contact: hi_liji_ck@hotmail.com
Name: Liji
Report Type: Error Report
Opt Subject: malayalam vowel au

I recommend Option A: Since Under this option, both orthographies can be represented in the same font

Date/Time: Mon May 8 08:04:57 CDT 2006
Contact: shibuonline@gmail.com
Name: Shibu.T
Report Type: Error Report
Opt Subject: Regarding Malayalam vowel-au

The modern orthography would use U+0D57. So, I recommend option-A.

Date/Time: Tue May 9 05:36:43 CDT 2006
Contact: jose_stephen@cdactvm.in
Name: Jose
Report Type: Public Review Issue
Opt Subject: Representation of Malayalam /au/ Vowel in Traditional and Modern Orthography

I will choose Option B; because i think it will easy to implement since no confusion of spelling and moreover this problem of old and new orthography can be solved in the font itself.

Date/Time: Thu May 11 01:50:59 CDT 2006
Contact: drsasha2002@yahoo.com
Name: university of kerala
Report Type: Public Review Issue
Opt Subject: Representation of Malayalam /au/ Vowel in Traditional and Reformed Orthography

93 i recommend for the second option as it will help to use the same spelling for both orthographies.

Date/Time: Sun May 14 16:04:18 CDT 2006
Contact: kent.karlsson@streamserve.com
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: 93 Representation of Malayalam /au/ Vowel in Traditional and Reformed Orthography

The only reasonable option is option A (as I have detailed at length on the Unicode list).

94 Proposed Update to UTS #10: Unicode Collation Algorithm

Date/Time: Sat Apr 29 14:21:53 CST 2006
Contact: mark.davis@icu-project.org
Name: Mark Davis
Report Type: Error Report
Opt Subject: Clarify UTS#10

Someone reported this on the email list, that

S2.1.1 If there are any combining marks following S, process each combining mark C.

should be

S2.1.1 If there are non-starters following S, process each non-starter C.

in order to pass the conformance test. This needs to be investigated.

Date/Time: Wed May 10 16:59:38 CDT 2006
Contact: richard.wordingham@ntlworld.com
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Unicode Collation Algorithm

The fault previously in the in the algorithm for conversion to NFC is also present in the algorithm for forming contractions in collation.

As currrently worded, let us consider a sequence A B C where B is a combining mark with positive combining class and C is a combining mark with zero combining class. Then is there is no contraction for the sequence AB, consideration is to be given to forming a contraction AC, for nothing between A and C is of combining class 0 or of the same combining class as C.

The example was raised by mike-list@pobox.com, on 26 April, but from the lack of a proposed correction I suspect no solution has been prepared. He said: quote:

I am implementing the UCA and am having trouble passing the conformance test. The problem is that I believe my code is correct and the test is wrong. For example the sequence:

09C7 1D165 09BE 0061

is supposed to come before

09C7 0001 09D7 0061

according to the test. What I am observing is that 09C7 combines with 09BE according to steps S2.1.1 thru S2.1.3. The intervening 1D165 is ignored since it is not of combining class 0. The combination 09C7 09BE becomes 09CB, which sorts after 09C7.

Note that this is the NON_IGNORABLE test. I have the same problem with the SHIFTED test. And also this is for version 4.1.0. end quote

My implementation of the collation algorithm avoids the problem by interpreting 'combining mark' as 'character of non-zero combining class', but the philosophy of the rule is the same as conversion to NFC - the sign of differences in combining class is artificial, but is used to choose a preferred method of combination when canonically equivalent sequences suggest alternative combinations or contractions.


Other Items: DUTR #30 Character Foldings

Date/Time: Wed May 10 07:54:23 CDT 2006
Contact: bqw10602@nifty.com
Name: SADAHIRO Tomoyuki
Report Type: Technical Report or Tech Note issues
Opt Subject: DUTR#30: HIRAGANA and KATAKANA ITERATION MARKs

In HiraganaFolding.txt

309D;	30F7  	# ゝ → ヷ
HIRAGANA ITERATION MARK → KATAKANA LETTER VA
309E;	30F8  	# ゞ → ヸ
HIRAGANA VOICED ITERATION MARK → KATAKANA LETTER VI

In KatakanaFolding.txt

30F7;	309D 	# ヷ → ゝ
KATAKANA LETTER VA → HIRAGANA ITERATION MARK
30F8;	309E 	# ヸ → ゞ
KATAKANA LETTER VI → HIRAGANA VOICED ITERATION MARK

But we have KATAKANA ITERATION MARKs.

30FD ヽ KATAKANA ITERATION MARK 
30FE ヾ KATAKANA VOICED ITERATION MARK

Therefore they should be as below.

309D ゝ = 30FD ヽ
309E ゞ = 30FE ヾ