Note
In January 2018, the ULI-TC was restructured as
ULI-SC, a subcommittee of CLDR.
In July 2020, the ULI-SC was disbanded after not
having convened for some time.
All site content and source repositories remain
online, but in read-only archival state. Thank you
for your interest in ULI.
|
Unicode Localization Interoperability Subcommittee
The Unicode Localization
Interoperability CLDR Subcommittee
(ULI) works to ensure interoperable data
interchange of critical localization-related
assets, including:
- Translation memory: A translation memory
system stores words or phrases that have been
tanslated previously. The use of translation memory
ensures the consistency of translated content,
accelerates the speed of translation, and also
reduces the cost of repeated translation
requests.
- Segmentation rules: Segmentation rules
define the way to segment text for translation or
other text processing. The rules are used in
conjunction with translation memory to create
memory segments or identify matches within the
source content of existing translation
memories.
- Translation source strings and their
translations: Translation source is natural
language text, typically with markup, that will be
translated into another language. The translated
strings are the results of translating the source
strings while preserving the markup.
- Word Count: Defining best
practices around how to best count words in the
context of translation interchange.
Whether a translation request is completed by
human or machine, these assets play a vital role in
the overall translation process. Interoperable
interchange of these assets reduces errors, lowers
costs, and improves throughput.
Charter and
Scope
Problem Statement
Localization Industry has
problems in data interchange between service end
points. With ULI, the intent is to solve the
following problems:
- Inconsistent application, implementation, and
interpretation of standards
- Lack of clear requirements for localization
data interchange
Localization Interoperability
Definition
Ensuring reliable localization data
interchange through consistent implementation of
localization standards and file formats.
Charter and Objectives
ULI will be the expert group with representatives
from localization service consumers, localization
service providers, tools/technology experts, academia
and standards organization to advise on interoperable
data interchange of critical localization-related
assets
Objectives
- Optimize the service time between systems
through consistent interpretation and adoption of
localization data interchange standards
- Mature existing standards and data
references by gathering requirements for
extensions of localization interoperability
standards
- Reduce cost through best practice guidelines by
providing open reference implementation of the
extensions and profiles
- Establish reference implementations or
extensions to improve the usefulness of
localization interoperability standards
Relationship to owners of Industry standards
- Existing standards will not be changed; the
goal is to extend them if needed.
- The TC will engage with standard organizations
as needed to influence existing/future
standards.
- The TC will contribute to existing standards
through an open platform.
All TC activities are guided by the
Technical Group Procedures.
See PDF:
ULI Charter
Process
Introduction
This document
describes the Unicode Localization
Interoperability Technical Committee, and its
process for specification definition,
interchange format and examples. The process
is designed to be light-weight: in
particular, the meetings are frequent, short,
and informal. Most of the work is by email or
phone, with a database recording requested
changes.
For more information on the formal procedures
for the Unicode CLDR Technical Committee, see
the Technical Committee Procedures
for the Unicode Consortium.
Specifications
Language Segmentation
The UAX 29
Unicode Text Segmentation defines the
guidelines for determining default
segmentation boundaries between certain
significant text elements. The interchange
specification
SRX (Segmentation Rule eXchange) is used
as the actual exchange format for
system-to-system communication of the
behavior of text segmentation associated with
any content. An actual example of the SRX at
language level is available as part of the
CLDR
project.
See CLDR
Process for more information on the
vetting and submission of language
segmentation input.
Translation Memory
The ability to
interchange memories as static content within
a translation request life cycle is defined
by TMX
(Translation Memory eXchange). The scope of
this work is under discussion.
Public Feedback
The public can supply formal feedback into
ULI by filing a Bug Report or Feature
Request. There is also a public forum for
questions at ULI Mailing List (details on
archives are found there).
Anyone can also asked to be added to a
list that will receive notification of new
bugs, so they can track issues if they want.
Anyone can also to reply to any bug report to
add comments or questions.
There is also a members-only ULI mailing
list for members of the ULI Technical
Committee.
Meeting Minutes
Minutes are archived here
Profiles of Use
The primary focus of the ULI Technical Committee
will be to establish profiles of use for XLIFF, TMX,
and SRX. The committee will develop and publish
specifications that document specific usage
conventions that can be shared for interoperability.
This will improve data exchange through more
consistent implementations and enhance the usefulness
of these three standards.
Extensions to Established
Standards
The secondary focus of the ULI
Technical Committee will be to gather requirements
for future extensions to XLIFF, TMX, and SRX. The ULI
committee will develop reference implementations, as
necessary, to demonstrate the feasibility of any
proposals for future standardization.
- TMX: Translation Memory eXchange (TMX)
is an XML-based standard for the exchange of
translation memory data created by computer-aided
translation and localization tools. TMX was
developed and maintained by LISA, the Localization
Industry Standards Association, until LISA became
insolvent in 2011. The format allows easier
exchange of translation memory between tools and/or
translators with little or no loss of critical
data.
See
http://en.wikipedia.org/wiki/Translation_Memory_eXchange
- SRX: Segmentation Rules eXchange (SRX)
is an XML-based standard that was maintained by
LISA, the Localization Industry Standards
Association. It provides a common way to describe
how to segment text for translation and other
language-related processes.
See
http://en.wikipedia.org/wiki/Segmentation_Rules_eXchange
- XLIFF: The XML Localisation Interchange
File Format (XLIFF) is maintained by OASIS. XLIFF
is the industry standard for exchanging
localization data (translation source and
translated results) between service users and
service providers.
See
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff
Word Count
One of the challenges of translation
interoperability is objectively measuring the
difficulty of a particular translation workload. A
common metric used is the word count. However,
methods for counting words vary across different
systems and languages. Some examples: Thai is
written without space characters between words, as
is Japanese and Chinese. Should numbers be included
or not included? Are Mongolian
suffixes considered a separate word or not? (Note that the
GitHub repository is now archived.)
You may see the past discussion on this
Github page.
These documents are archived for historical
purposes and do not specify a Unicode standard.
These documents are already publicly available
online elsewhere, are are only hosted on the
Unicode ULI site as a convenience.
Participation
For information on how to join the ULI and get
involved in its work, contact the Unicode Consortium
with the contact
form and ask about the ULI.
To become a voting participant in the work of the
ULI committee, join
Unicode in one of the three voting categories of
membership: Full, Institutional, or Supporting. Learn
about the benefits
of joining.
The officers of the ULI will establish the meeting
schedule. Meetings are to be conducted by conference
call to enable broad participation by members of the
industry.
Data Files
ULI Data Files (restricted access)
Officers
The Technical Committee Officers were:
- Chair: Steven R Loomis
- Vice Chair (Interim): Yoshito Umaoka (IBM)