[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10845(accepted data)

Opened 7 months ago

Last modified 3 months ago

Missing characters for traditional to simplified Chinese transform

Reported by: pedberg Owned by: pedberg
Component: translit Data Locale:
Phase: rc Review:
Weeks: Data Xpath:

Description (last modified by pedberg) (diff)

From http://bugs.icu-project.org/trac/ticket/13527

We are using the ICUTransformFilter to normalize traditional Chinese text to simplified Chinese. We received feedback that there are some traditional characters that are not converted to their simplified variants. For example:

  • "眞" (771E) should be converted to "真" (771F)
  • "硏" (784F) should be converted to "研" (7814)
  • "夲" (5932) should be converted to "本" (672C)


Change History

comment:1 Changed 7 months ago by mark

  • Owner changed from anybody to pedberg
  • Status changed from new to accepted

comment:2 Changed 3 months ago by pedberg

  • Phase changed from dsub to rc
  • Description modified (diff)

Reformatted description and added code points. Will get some confirmation of this. The traditional characters are all in T3 source, maybe not that common?

comment:3 Changed 3 months ago by pedberg

From our internal reviewers, the first two are valid:

  • "眞" (771E) should be converted to "真" (771F)
  • "硏" (784F) should be converted to "研" (7814)

The third is not, in general:

  • "夲" (5932) should be converted to "本" (672C) [no]

"夲" (5932) and "本" (672C) normally have different meanings and pronunciations. Some dictionaries might show a relationship between "夲" (5932) and "本" (672C) but only for one specific pronunciation of "夲" (5932), namely "běn".


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.