[Unicode]  Technical Notes
 

Unicode Technical Note #54

Annotated Line Breaking Algorithm

Version 2
Author Robin Leroy
Date 2024-09-18
This Version https://www.unicode.org/notes/tn54/tn54-2.html
Previous Version https://www.unicode.org/notes/tn54/tn54-1.html
Latest Version https://www.unicode.org/notes/tn54/


Summary

The Unicode Line Breaking Algorithm, defined in UAX #14, Unicode Line Breaking Algorithm, has changed with each subsequent version of the Unicode Standard. The maintenance of the algorithm frequently requires understanding interactions between the rules and property assignments and historical background behind them that would be too obscure to document in the standard. The purpose of this Unicode Technical Note is to provide a detailed history the UAX, as well as annotations useful to the maintainers.

Status

This document is a Unicode Technical Note. Sole responsibility for its contents rests with the author. Publication does not imply any endorsement by the Unicode Consortium.

For information on Unicode Technical Notes, including criteria for acceptance, see Unicode Technical Notes.

Contents

The body of this Unicode Technical Note is contained in the HTML file “alba-2.html.”

Description

The attached HTML file is not normative, and it is not the actual Unicode Standard Annex; implementers should refer to UAX #14, which is the normative document.

The HTML file contains the entire text of Unicode Standard Annex #14, Unicode Line Breaking Algorithm, Version 16.0, plus certain annotations. The annotations give a more in-depth analysis of the algorithm. They describe the reason for each nonobvious rule, and point out interesting ramifications of the rules and interactions among the rules (interesting to Unicode maintainers, that is).

The changes in each successive published version between the original Unicode Line Breaking Algorithm (Unicode Version 3.0.0) and Version 16.0 are indicated with highlighting and strikethroughs; a sidebar makes it possible to select the range of versions of interest. UTC dispositions, documents, and public review feedback relevant to the changes are listed in curly brackets.

Modifications

The following summarizes modifications from the previous version of this document.

2 Updated for Unicode Version 16.0.
1 First version.