From: Mark Davis ☕ (mark@macchiato.com)
Date: Mon Jul 26 2010 - 14:41:36 CDT
Mark
*— Il meglio è l’inimico del bene —*
On Mon, Jul 26, 2010 at 09:40, Shriramana Sharma <samjnaa@gmail.com> wrote:
> Hello list.
>
> I have a question about VS characters and the default ignorable property.
>
> TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable. Ch
> 5.21 states that default ignorable characters are to be ignored in rendering
> (except in specialized modes which show hidden characters).
>
That is incorrect. What it actually says is (my bold):
"Default ignorable code points are those that should be ignored by default
in rendering *unless explicitly supported.* "
Or to put it in other terms:
If your rendering system doesn't explicitly support character X, it should
be ignored by default (as if it hadn't been in the string to be rendered).
So if you *do *support a given variation sequence, then this clause doesn't
apply; as a matter of fact, supporting it means that it is not ignored; that
it has a visible impact on the rendering.
>
> The paragraph in p 171 on default ignorable characters under ch 5.3 states
> that "these characters are also ignored except with respect to specific,
> defined processes; for example, zero width non-joiner is ignored by default
> in collation."
>
> This seems to suggest to me that despite ch 5.21 speaking only about
> rendering, the default ignorable property also has or at least can have a
> part in other processes such as collation. I would however like to have a
> confirmation on this:
>
> Are all default ignorable characters ignored not only in rendering
incorrect assumption, see above.
> but in other processes also?
>
Yes, in that in processing they should be ignored unless they are relevant
to the kind of processing involved. Note that other characters may also be
ignored, depending on the processing. So there is not a hard-and-fast rule.
- For example, in collation any of the characters in
http://unicode.org/Public/UCA/6.0.0/allkeys-6.0.0d1.txt with weights
starting "[.0000.0000.0000." are ignorable by default, and include
characters that are not default-ignorable.
- For word-segmentation Extend and Format characters are ignored (except
for edge cases): see
http://unicode.org/reports/tr29/#Default_Word_Boundaries Those include
many more characters than just the default-ignorables, and exclude 5
characters (Hangul fillers and ZWSP). See also
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Word_Break:Format:][:Word_Break:Extend:]&g=di
.
In other words, default-ignorables should usually be ignored by
non-rendering processes, but there will be exceptions. And other characters
may also be ignored, depending on the process.
> Or is it that they are ignored by default in rendering and whether they are
> ignored in other processes or not is variable?
>
> Specifically, are VS characters ignored in rendering only (i.e. rendering
> them, not the characters they apply to of course) or are they ignored even
> in other processes such as text search and collation?
>
> --
> Shriramana Sharma
>
>
This archive was generated by hypermail 2.1.5 : Mon Jul 26 2010 - 14:43:22 CDT