Technical Notes |
Version | 2 |
Authors | Ken Whistler, Rick McGowan |
Date | 2012-07-11 |
This Version | http://www.unicode.org/notes/tn33/tn33-2.html |
Previous Version | http://www.unicode.org/notes/tn33/tn33-1.html |
Latest Version | http://www.unicode.org/notes/tn33/ |
This document provides a list of danda characters in the Unicode Standard.
This document is a Unicode Technical Note. Sole responsibility for its contents rests with the author(s). Publication does not imply any endorsement by the Unicode Consortium. This document is not subject to the Unicode Patent Policy.
For information on Unicode Technical Notes including criteria for acceptance, see http://www.unicode.org/notes/.
Dandas are punctuation characters commonly seen in the typographic traditions of writing systems of South and Southeast Asia. While they occur in many scripts, they are primarily found in traditional materials written in scripts historically derived from the Brahmi script.
The typical appearance of a danda is simply a vertical bar. Two vertical bars may also be paired together in a corresponding punctuation mark known as a double danda. Tripled forms may also occur, but are much less common. Although forms based on a simple vertical bar are typical, in some scripts more elaborate forms have developed, and in some cases—such as Tibetan, in which the danda is termed a shad—the danda mark may accrue additional adornments.
Dandas generally delimit phrase-, sentence-, or section-level divisions in text. When both a single and a double danda occur, the double danda is used to demarcate larger units of text than the single danda. This usage is roughly comparable to the use of commas and full stops in Western typography, although dandas typically mark larger phrasal units than what might be separated by commas in Western typography. In many traditional materials, dandas and double dandas delimit what might be best termed verses or sections, and do not map easily onto concepts such as "sentence". Usage may also vary by script, by language, and by corpus.
Many South and Southeast Asian scripts in modern usage have adopted Western typographic practice in varying degrees. In such contexts dandas are often supplanted by common-use Western punctuation marks.
Many of the danda characters encoded in the Unicode Standard have the word "DANDA" in their name, but there are many instances where punctuation marks are encoded, which historically and functionally are dandas, but which have distinct names specific to a particular script. Also, because danda characters do not all have simple, vertical bar shapes, they are not always easy to find when searching the code charts.
To make it easier to identify danda characters in the Unicode Standard, this Technical Note includes a specific list of known danda characters as of Unicode 6.0. This list may be periodically updated in the future, if further danda characters are added to the Standard.
The table below is in the usual Unicode Data File format of semi-colon delimited fields optionally followed by "#" and a comment. The table contains a list of characters in the Unicode Standard that are dandas. The first field is a codepoint or codepoint range. The second field is the General Category of the character. The third field is a comment giving the names of the characters or the first and last characters in the range.
# Dandas # [Not derivable] 0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA 0E5A ; Po # THAI CHARACTER ANGKHANKHU 0F08 ; Po # TIBETAN MARK SBRUL SHAD 0F0D..0F12 ; Po # [7] TIBETAN MARK SHAD..TIBETAN MARK RGYA GRAM SHAD 104A..104B ; Po # [2] MYANMAR SIGN LITTLE SECTION..MYANMAR SIGN SECTION 1735..1736 ; Po # [2] PHILIPPINE SINGLE PUNCTUATION..PHILIPPINE DOUBLE PUNCTUATION 17D4..17D5 ; Po # [2] KHMER SIGN KHAN..KHMER SIGN BARIYOOSAN 1AA8..1AAB ; Po # [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU 1B5E..1B5F ; Po # [2] BALINESE CARIK SIKI..BALINESE CARIK PAREREN 1C3B..1C3C ; Po # [2] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION NYET THYOOM TA-ROL 1C7E..1C7F ; Po # [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD A876..A877 ; Po # [2] PHAGS-PA SHAD..PHAGS-PA MARK DOUBLE SHAD A8CE..A8CF ; Po # [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA A92F ; Po # KAYAH LI SIGN SHYA A9C8..A9C9 ; Po # [2] JAVANESE PADA LINGSA..JAVANESE PADA LUNGSI AA5D..AA5F ; Po # [3] CHAM PUNCTUATION DANDA..CHAM PUNCTUATION TRIPLE DANDA AAF0 ; Po # MEETEI MAYEK CHEIKHAN ABEB ; Po # MEETEI MAYEK CHEIKHEI 10A56..10A57 ; Po # [2] KHAROSHTHI PUNCTUATION DANDA..KHAROSHTHI PUNCTUATION DOUBLE DANDA 11047..11048 ; Po # [2] BRAHMI DANDA..BRAHMI DOUBLE DANDA 110C0..110C1 ; Po # [2] KAITHI DANDA..KAITHI DOUBLE DANDA 11141..11142 ; Po # [2] CHAKMA DANDA..CHAKMA DOUBLE DANDA 111C5..111C6 ; Po # [2] SHARADA DANDA..SHARADA DOUBLE DANDA |
[Glossary] | Unicode Glossary http://www.unicode.org/glossary/ For explanations of terminology used in this and other documents. |
[UCD] | Unicode Character Database http://www.unicode.org/ucd/ For detailed documentation about the Unicode Character Database, see Unicode Standard Annex #44: Unicode Character Database http://www.unicode.org/reports/tr44/ |
[Unicode] | The Unicode Standard For the latest version, see: http://www.unicode.org/versions/latest/ |
The following summarizes modifications from the previous version of this document.
2 | Updated for Unicode 6.1 additions. |
1 | Initial version, corresponding to Unicode 6.0. |
Copyright © 2010-2012 Rick McGowan, Ken Whistler, and Unicode, Inc. All Rights Reserved. The Unicode Consortium and the authors make no expressed or implied warranty of any kind, and assume no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical note. The Unicode Terms of Use apply.
Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.