Date: September 10, 2001
|
|
Title: |
White-space processing |
Source: |
Michel Suignard (Microsoft) |
Action: |
FYI |
Distribution |
NCITS/L2 and UTC |
1…
9. A conforming user agent must meet all of the following criteria (as defined in [XHTML1]):
ID
(e.g., the id
attribute on most XHTML elements) as fragment identifiers. The XML processor normalizes different systems' line end codes into one single LINE FEED character, that is passed up to the application.
The user agent must process white space characters in the data received from the XML processor as follows:
xml:space
' attribute is set to 'preserve
', white space characters must be preserved and
consequently LINE FEED characters within a block must not be converted. xml:space
' attribute is not set to 'preserve
', then: White space in attribute values is processed according to [XML].
In determining how to convert a LINE FEED character a user agent must meet the following rules, whereby the script of characters on either side of the LINE FEED determines the choice of the replacement. The assignment of script names to all characters is done in accordance to the Unicode [UNICODE] technical report TR#24 (Script Names).
1. COMMON script characters (such as punctuation) are treated the same as if they belong to the same script as the character on the other side.
2. INHERITED script characters (such as combining characters) are treated as if they belong to the same script as the previous character if preceding the LINE FEED character or the same script as the character on the preceding side if following the LINE FEED character.
3. If the characters preceding and following the LINE FEED character belong to HAN, HIRAGANA or KATAKANA script, the LINE FEED must be converted into no character.
4. If the characters preceding and following the LINE FEED character belong to KHMER, LAO, MYANMAR or THAI script, the LINE FEED must be converted into a ZERO WIDTH SPACE character (​) or no character. (This rule may be extended in the future to additional scripts, not yet encoded, that do not separate their words by space characters)
5. If none of the conditions in (3) through (4) are true, the LINE FEED character must be converted into a SPACE character. This covers the case of many scripts like LATIN, CYRILLIC, GREEK, etc. This also covers the case that only the COMMON script has been detected on both sides, and the case that scripts belong to different categories.
Note (informative): Some scripts, such as HAN, HIRAGANA, KATAKANA, KHMER, LAO, MYANMAR, THAI do not use space characters for word boundary delimitation, but may still use these space characters for delimitation of sentences or fragments of sentence. If such a character occurs as the last character before a LINE FEED character, or a character following a LINE FEED character, it may be eliminated by the white space processing described above. Several solutions are possible:
Name: |
'wrap-option' |
Value: |
no-wrap | wrap | inherit |
Initial: |
wrap |
Applies to: |
all elements |
Inherited: |
yes |
Percentages: |
N/A |
Media: |
This property controls whether or not text wraps when it reaches the flow edge of its containing block box
wrap
Line-breaking occurs if the line overflows the available block width. The specific line breaking algorithm is determined by the 'line-break' and word-break' properties.
no-wrap
No line wrapping is performed. In the case when lines are longer than the available block width, the overflow will be treated in accordance with the 'overflow' property specified in the element.
White-space processing in the context of CSS is the mechanism by which all white-space characters are interpreted for rendering purpose. The white-space set is determined by the XML specification as being a combination of one or more space characters (Unicode value U+0020), carriage returns (U+000D), line feeds (U+000A), or tabs (U+0009).
The amount of white space processing that can be achieved by a user agent that supports CSS is directly related to the CSS processing model, especially the document parsing and validation. After parsing and possible validation, the document tree may contain text nodes that contain unprocessed white space characters, or the document tree may already have been processed in a way that white space characters have been collapsed and partially removed (white space normalization).
In that respect, the CSS properties related to white space processing can only be effective if the CSS processor has access to the white space characters that were originally encoded in the document. However, end-of-line characters are typically handled (like by XML processors) in such a way that any arbitrary combination of end-of-line characters is replaced by a single line feed character (U+000A).
Note: XML Schema, through its 'whiteSpace' facet can constrain exactly the type of white space characters still available to a rendering process like CSS for elements containing string datatype. In addition, some XML languages like XHTML may have their own white-space processing rules when parsing and validating documents with white-space characters. Therefore, some of the behaviors described below may be affected by these limitations and may be user agent dependent in these contexts.
The typical white-space processing, similar to XHTML-MOD is as follows:
Note: These rendering rules make no assumption about the storage model of these white-space character sequences. It is outside the scope of CSS to determine the character code values accessible through programming interface such as DOM.
The following properties: 'linefeed-treatment', 'space-treatment' and 'white-space-treatment' allow precise controls of that typical behavior. The 'linefeed-treatment' determines the rendering of the line feed characters. The 'space-treatment' determines the rendering of white space character (except line feed). And the 'white-space-collapse' property determines the treatment of consecutive white-space characters after consideration of the two prior properties. The 'white-space' property is redefined as a shortcut property which sets the values of these three new properties as well as the 'wrap-option' property (the latter for compatibility reason with earlier versions of CSS).
Name: |
'linefeed-treatment' |
Value: |
auto | ignore | preserve | treat-as-space | treat-as-zero-width-space | inherit |
Initial: |
treat-as-space |
Applies to: |
all elements |
Inherited: |
yes |
Percentages: |
N/A |
Media: |
This property specifies the treatment of linefeeds (U+000A characters). Values have the following meanings:
auto
Linefeed characters are transformed for rendering purpose into one of the following characters: a space character, a zero width space character (U+200B), or no character (i.e. not rendered). The choice of the resulting character is conditioned by the script property of the characters preceding and following the line feed character in the same line flow elements part of the same block element. The result of the transformation can be treated by subsequent CSS processing (including white space collapsing).
ignore
Linefeed characters are ignored. i.e. they are transformed for rendering purpose into no character.
preserve
Linefeed characters indicate a an end of line of boundary.
treat-as-space
Linefeed characters are transformed for rendering purpose into a space character (U+0020). The result of the transformation can be treated by subsequent CSS processing (including white space collapsing).
treat-as-zero-width-space
Linefeed characters are transformed for rendering purpose into a zero width space character (U+200B). The result of the transformation can be treated by subsequent CSS processing (including white space collapsing).
Note: The Unicode Standard recommends that the zero width space is considered a valid line-break point and that if two characters with a zero width space in between are placed on the same line they are placed with no space between them and that if they are placed on two lines no additional glyph area, such as for a hyphen, is created at the line-break.
Name: |
'space-treatment' |
Value: |
ignore | preserve
ignore-if-before-linefeed | ignore-if-after-linefeed | |
Initial: |
preserve |
Applies to: |
all elements |
Inherited: |
yes |
Percentages: |
N/A |
Media: |
This property specifies the treatment of space (U+0020) and other white space characters except for linefeeds (U+000A), since their treatment is determine by the linefeed-treatment property. Values have the following meanings:
ignore
White space characters, except for linefeeds, are ignored. i.e. they are transformed for rendering purpose into no character.
preserve
All white space characters are rendered as intended. The tab character (U+0009) is rendered as the smallest non-zero number of spaces necessary to line characters up along tab stops that are every 8 characters. The treatment of linefeeds is not determined by this property.
ignore-if-before-linefeed
Specifies that any white space characters, except for linefeeds, that immediately precedes a linefeed character, shall be discarded. This action shall take place regardless of the setting of the linefeed-treatment property.
ignore-if-after-linefeed
Specifies that any white space characters, except for linefeeds, that immediately follows a linefeed character, shall be discarded. This action shall take place regardless of the setting of the linefeed-treatment property.
ignore-if-surrounding-linefeed
Specifies that any white space characters, except for linefeeds, that immediately precedes or follows a linefeed character, shall be discarded. This action shall take place regardless of the setting of the linefeed-treatment property.
Name: |
'white-space-treatment' |
Value: |
preserve | collapse | inherit |
Initial: |
collapse |
Applies to: |
all elements |
Inherited: |
yes |
Percentages: |
N/A |
Media: |
The "white-space-treatment" property specifies the treatment of all consecutive white-space (with no exception for linefeed characters, unlike the "space-treatment" property). Values have the following meanings:
preserve
All white space characters are rendered as intended. The tab character (U+0009) is rendered as the smallest non-zero number of spaces necessary to line characters up along tab stops that are every 8 characters.
collapse
Specifies, for all the following characters should not be rendered:
· the character is a white space (according to XML), and
·
it is not a preserved linefeed (due to
linefeed-treatment="preserve"
), and
· the immediately preceding (non-ignored) character is a white-space or the immediately following (non-ignored) character is a preserved linefeed.
Name: |
'white-space' |
Value: |
normal | pre | nowrap | inherit |
Initial: |
normal |
Applies to: |
all elements |
Inherited: |
yes |
Percentages: |
N/A |
Media: |
This property declares how white-space inside the element is handled. Setting a value on the 'white-space' property set the respective values on 'wrap-option', 'linefeed-treatment', 'space-treatment' and 'white-space-treatment'.
white-space |
wrap-option |
linefeed-treatment |
space-treatment |
white-space-treatment |
normal |
wrap |
auto |
preserve |
collapse |
nowrap |
no-wrap |
auto |
preserve |
collapse |
pre |
no-wrap |
preserve |
preserve |
preserve |
The following examples show what whitespace behavior is expected from the PRE and P elements, and the "nowrap" attribute in HTML.
PRE { white-space: pre }
P { white-space: normal }
TD[nowrap] { white-space: nowrap }
Conforming user agents may ignore the 'white-space' property in author and user style sheets but must specify a value for it in the default style sheet.