Hello Karl, others,
On 2017/05/27 06:15, Karl Williamson via Unicode wrote:
> On 05/26/2017 12:22 PM, Ken Whistler wrote:
>>
>> On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote:
>>> The link provided about the PRI doesn't lead to the comments.
>>>
>>
>> PRI #121 (August, 2008) pre-dated the practice of keeping all the 
>> feedback comments together with the PRI itself in a numbered directory 
>> with the name "feedback.html". But the comments were collected 
>> together at the time and are accessible here:
>>
>> http://www.unicode.org/L2/L2008/08282-pubrev.html#pri121
>>
>> Also there was a separately submitted comment document:
>>
>> http://www.unicode.org/L2/L2008/08280-pri121-cmt.txt
>>
>> And the minutes of the pertinent UTC meeting (UTC #116):
>>
>> http://www.unicode.org/L2/L2008/08253.htm
>>
>> The minutes simply capture the consensus to adopt Option #2 from PRI 
>> #121, and the relevant action items.
>>
>> I now return the floor to the distinguished disputants to continue 
>> litigating history. ;-)
>>
>> --Ken
>>
>>
> 
> The reason this discussion got started was that in December, someone 
> came to me and said the code I support does not follow Unicode best 
> practices, and suggested I need to change, though no ticket (yet) has 
> been filed.  I was surprised, and posted a query to this list about what 
> the advantages of the new approach are.
Can you provide a reference to that discussion? I might have missed it 
in December.
> There were a number of replies, 
> but I did not see anything that seemed definitive.  After a month, I 
> created a ticket in Unicode and Markus was assigned to research it, and 
> came up with the proposal currently being debated.
Which is to completely reverse the current recommendation in Unicode 
9.0. While I agree that this might help you fending off a bug report, it 
would create chances for bug reports for Ruby, Python3, many if not all 
Web browsers,...
> Looking at the PRI, it seems to me that treating an overlong as a single 
> maximal unit is in the spirit of the wording, if not the fine print.
In standards, the "fine print" matters.
> That seems to be borne out by Markus, even with his stake in ICU, 
> supporting option #2.
Well, at http://www.unicode.org/L2/L2008/08282-pubrev.html#pri121, I 
also supported option 2, with code behind it.
> Looking at the comments, I don't see any discussion of the effect of 
> this on overlong treatments.  My guess is that the effect change was 
> unintentional.
I agree that it was probably not considered explicitly. But overlongs 
were disallowed for security reasons, and once the definition of UTF-8 
was tightened, "overlongs" essentially did not exist anymore. 
Essentially, "overlong" is a word like "dragon" or "ghost": Everybody 
knows what it means, but everybody knows they don't exist.
[Just to be sure, by the above, I don't mean that a sequence such as
C0 B0 cannot appear somewhere in some input. But C0 is not UTF-8 all by 
itself, and there is no need to see C0 B0 as a (ghost) sequence.]
> So I have code that handled overlongs in the only correct way possible 
> when they were acceptable,
No. As long as they were acceptable, they wouldn't have been replaced by 
an FFFD.
> and in the obvious way after they became illegal,
Why? A change was necessary from producing an actual character to 
producing some number of FFFDs. It may have been easier to produce just 
a single FFFD, but that depends on how the code was organized.
> and now without apparent discussion (which is very much akin to 
> "flimsy reasons"), it suddenly was no longer "best practice".
Not 'now', but almost 9 years ago. And not "without apparent 
discussion", but with an explicit PRI.
> And that 
> change came "rather late in the game".  That this escaped notice for 
> years indicates that the specifics of REPLACEMENT CHAR handling don't 
> matter all that much.
I agree. You haven't even yet received a ticket yet.
> To cut to the chase, I think Unicode should issue a Corrigendum to the 
> effect that it was never the intent of this change to say that treating 
> overlongs as a single unit isn't best practice.  I'm not sure this 
> warrants a full-fledge Corrigendum, though.  But I believe the text of 
> the best practices should indicate that treating overlongs as a single 
> unit is just as acceptable as Martin's interpretation.
I'd essentially be fine with that, under the condition that the current 
recommendation is maintained as a clearly identified recommendation, so 
that Python3, Ruby, Web standards and browsers, and so on can easily 
refer to it.
Regards,   Martin.
> I believe this is pretty much in line with Shawn's position.  Certainly, 
> a discussion of the reasons one might choose one interpretation over 
> another should be included in TUS.  That would likely have satisfied my 
> original query, which hence would never have been posted.
> .
> 
Received on Tue May 30 2017 - 06:27:07 CDT
This archive was generated by hypermail 2.2.0 : Tue May 30 2017 - 06:27:07 CDT