RE: Does OpenOffice 3.0 handle unicode?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Mar 21 2009 - 12:10:32 CST

  • Next message: Petr Tomasek: "Re: Does OpenOffice 3.0 handle unicode?"

    > [mailto:unicode-bounce@unicode.org] De la part de Petr Tomasek
    > Envoyé : samedi 21 mars 2009 17:42
    > À : Unicode@unicode.org
    > Objet : Does OpenOffice 3.0 handle unicode?
    >
    >
    > Can someone, please, confirm whether the new version of
    > OpenOffice can handle unicode? OpenOffice 2.0 unfortunatelly
    > can handle only the BMP, while I need characters from the SMP.

    That's quite a stupid question: if OpenOffice can "handle" the BMP
    characters, it means that it "handles" Unicode.
    Appanretly you seem to ignore that OpenOffice was designed using Unicode as
    a goal, and using file formats that require the correct support of Unicode.
    This support has always been part of the file format specifications (that
    are based on XML files compressed within a zipped archive).

    I can perfectly open Chinese documents containing characters from the SIP,
    with OpenOffice (all versions, including those before 2.0).

    This is not a problem of OpenOffice version but of support of the display of
    the characters and scripts (for complex scripts) in the system's or
    application's renderer. But if you don't have any font for those scripts you
    want to render and that are part of the SMP, all you'll get is a set of
    empty boxes. But even in that case, OpenOffice will not destroy the document
    if it contains such sequences of characters that it cannot render with
    missing fonts.

    OpenOffice contains a limited set of fonts, but not for all characters and
    scripts found in Unicode. Complex scripts that require a specific layout
    engine for correct rendering (because the simple one-to-one mapping from a
    character to a glyph does not work as expected, or result in very poor
    layout and missing contextual forms) will also need upgrade either in your
    system or in your (MS/Open/Star-)Office application as well.

    So, on the same system, if I can open a document containing non-BMP
    characters with MS Office, I can as well open it with OpenOffice (or Sun
    StarOffice). And on the reverse, I can also save a document with OpenOffice
    into the legacy format supported by MSOFfice and open it in MS Office; This
    makes no difference for the rendering and support of characters (there may
    exist some differences in the support of specific macros, or advanced
    stylesheets, or in specific page layouts, but the text itself is not
    affected, and equally readable in both softwares).

    Note that if you can already display those characters you want in a web
    browser or in a email agent, you'll be able to see them in an Office app.

    The reverse is not always true, i.e. some texts that can be worked on and
    displayed corectly in an Office application may be rendered poorly or not at
    all in your local web browser when converted to HTML, and it is also not
    true if your "Office" application is just a legacy Notepad or similar
    application designed for simple plain text documents only.

    For example NotePad++ is one of those "advanced" editors that work even
    worse than Notepad for characters out of the local-only "ANSI" legacy 8-bit
    codeset of Windows, and it still really does not support Unicode internally
    but just contains an external converter to/from UTF-8, in a VERY lossy
    conversion scheme. Its support for larger character sets is a bit better in
    the latest version, but still, most of its tools are not compliant and can
    only handle characters that have roun-trip conversion with the local ANSI
    codepage (and for some of them, it ionly works correctly if this codepage is
    only a specific one, like 1250 or 1252 only). It also doesnot work at all
    with the BiDi algorithm. It should not be used to edit XML or HTML documents
    containing any RTL script or complex script (some of the descructive actions
    made by it are orrevocable and performed silently without any warning).

    On the opposite, working on those XML or HTML documents in OpenOffice is
    very safe: the fact that a character or string cannot be properly displayed
    using something else than empty boxes does not mean that it will replace the
    characters by others (of its choice) without notice. OpenOffice accepts and
    respects the whole UCS (i.e. with code points in range U+0000..U+10FFFF),
    possibly only giving restrictions for some of them (see the strict XML
    specifications about permanently forbidden characters within this range: the
    forbidden characters are most controls like U+0000, or code points
    permanently bound to non-characters like U+FFFF or U+FFFE; there's not a lot
    of forbidden characters, and forbidden characters do not include any
    unassigned code points because they may be assigned to valid characters at
    any time in an undefined future or may alreay be assigned in a version
    unknown at the time when your application was last written and delivered to
    you.)



    This archive was generated by hypermail 2.1.5 : Sat Mar 21 2009 - 12:14:41 CST