UTF-8 in SNMPv3

From: Randy Presuhn (rpresuhn@peer.com)
Date: Thu Jul 03 1997 - 20:24:07 EDT


Hi -

The SNMPv3 working group of the IETF is hoping to make use of UTF-8
for some human-readable information in the MIBs used to manage SNMPv3.

The convention currently used for this kind of information is described
on page 4 of RFC 1903. (For easy reference, I've appended the text
to the end of this message.) We would like to define a new convention
formulated in terms of UTF-8 for use in new MIBs.

What we've not yet reached agreement on is the question of "non-printable
stuff". Some believe that NVT ASCII's control characters are somehow
less problematic than those of 10646, others find the problems equivalent.
The questions that come to my mind are:

        1) Is there any merit to the argument that the "non-printable
           stuff" in 10646 is any better or worse than the NVT ASVII
           definition?

        2) Can we use standard character properties to identify a
           "printable" subset that would not break for any language?
           (The folks that want these also want to have CRLF...)

Background information:
        In the SNMP protocol notions of equality and ordering have no
        "locale" component. There is no notion of character equivalence.
        It is very much a "bits is bits" environment.

        The concerns of working group members appear to be arising from:
                1) what does it mean to "support 10646"
                2) how to display "wierd stuff"
                3) how to input "wierd stuff"
                4) the old CR/LF problem

Is there a nice, concise, convincing answer I can take back to the
working group?

 ========== Excerpt from RFC 1903, DisplayString Textual convention ==========
            "Represents textual information taken from the NVT ASCII
            character set, as defined in pages 4, 10-11 of RFC 854.

            To summarize RFC 854, the NVT ASCII repertoire specifies:

              - the use of character codes 0-127 (decimal)

              - the graphics characters (32-126) are interpreted as
                US ASCII

              - NUL, LF, CR, BEL, BS, HT, VT and FF have the special
                meanings specified in RFC 854

              - the other 25 codes have no standard interpretation

              - the sequence 'CR LF' means newline

              - the sequence 'CR NUL' means carriage-return

              - an 'LF' not preceded by a 'CR' means moving to the
                same column on the next line.

              - the sequence 'CR x' for any x other than LF or NUL is
                illegal. (Note that this also means that a string may
                end with either 'CR LF' or 'CR NUL', but not with CR.)

            Any object defined using this syntax may not exceed 255
            characters in length."
 ========== End Excerpt ===============

 ---------------------------------------------------------------------
 Randy Presuhn BMC Software, Inc. (Silicon Valley Division)
 Voice: +1 408 556-0720 (Formerly PEER Networks) http://www.bmc.com
 Fax: +1 408 556-0735 1190 Saratoga Avenue, Suite 130
 Email: rpresuhn@bmc.com San Jose, California 95129-3433 USA
 ---------------------------------------------------------------------
 In accordance with the BMC Communications Systems Use and Security
 Policy memo dated December 10, 1996, page 2, item (g) (the first of
 two), I explicitly state that although my affiliation with BMC may be
 apparent, implied, or provided, my opinions are not necessarily those
 of BMC Software and that all external representations on behalf of
 BMC must first be cleared with a member of "the top management team."
 ---------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT