[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

Changeset 14947

03/15/19 19:31:18 (9 days ago)

cldrbug 11922: clarify language matching

1 edited


  • trunk/specs/ldml/tr35.html

    r14943 r14947  
    108108      <tr> 
    109109        <td>Date</td> 
    110         <td class="changed">2019-03-14</td> 
     110        <td class="changed">2019-03-15</td> 
    111111      </tr> 
    112112      <tr> 
    394394          <li>4.3 <a href="#Likely_Subtags">Likely Subtags</a></li> 
    395395          <li>4.4 <a href="#LanguageMatching">Language Matching</a> 
    396             <ul> 
     396            <ul class='toc'> 
    397397              <li>4.4.1 <a href= 
    398398              "#EnhancedLanguageMatching">Enhanced Language 
    54985498    algorithm, the languageMatching data is interpreted as an 
    54995499    ordered list.</p> 
     5500    <p class='changed'>Distances between given pair of subtags can be larger or smaller than the typical distances. For example, the distance between en and en-GB can be greater than those between en-GB and en-IE. In some cases, language and/or script differences can be as small as the typical region difference. (Example: sr-Latn vs. sr-Cyrl).</p> 
     5501    <p class='changed'>The distances resulting from the table are not linear, but are rather chosen to produce expected results. So a distance of 10 is not necessarily twice as &quot;bad&quot; as a distance of 5.</p> 
    55005502    <p>The language matching algorithm takes a list of a user’s 
    55015503    desired languages, and a list of the application’s supported 
    55045506      <li>Set the best weighted distance BWD to ∞</li> 
    55055507      <li>Set the best desired language BD to null</li> 
     5508      <li class='changed'>Set the best supported language BS to null</li> 
    55065509      <li>For each desired language D 
    55075510        <ul> 
    5508           <li>Compute a discount factor F, based on the position in 
     5511                        <li>Compute a <span class='changed'>demotion value</span> F, based on the position in 
    55095512          the list. 
    55105513            <ul> 
    5511               <li>This discount factor is up to the implementation, 
     5514              <li>This <span class="changed">demotion value</span> is up to the implementation, 
    55125515              but is typically a positive value that increases 
    55135516              according to how far D is from the start of the 
    55245527                  <li>BWD = WD</li> 
    55255528                  <li>BD = D</li> 
     5529                  <li class='changed'>BS = S</li> 
    55265530                </ul> 
    55275531              </li> 
    55305534        </ul> 
    55315535      </li> 
    5532       <li>If the BWD is less than a threshold, return BD. 
     5536      <li>If the BWD is less than a threshold, return <span class="changed">&lt;</span>BD<span class="changed">, BS&gt;</span> 
    55335537        <ul> 
    5534           <li>The threshold is implementation-defined, typically 
     5538        <li>The threshold is implementation-defined, typically 
    55355539          set to greater than a default region difference, and less 
    5536           than a default script difference.</li> 
     5540        than a default script difference.</li> 
    55375541        </ul> 
    55385542      </li> 
    5539       <li>Otherwise return a default supported language (like 
    5540       English).</li> 
     5543      <li>Otherwise <span class="changed"> BD = the</span> default supported language (like 
     5544      English); <span class="changed">return &lt;</span>BD<span class="changed">, null&gt;</span></li> 
    55415545    </ul> 
    55425546    <p>To find the matching distance MD between any two languages, 
    55505554      </li> 
    55515555      <li>Set the match-distance MD to 0</li> 
    5552       <li>For each subtag in the list, starting from the end: 
    5553       region, script, base-language 
    5554         <ol> 
    5555           <li>If respective subtags in each language tag are 
     5556      <li>For each subtag <span class="changed">in {language, script, region}</span> 
     5558        <li>If respective subtags in each language tag are 
    55565559          identical, remove the subtag from each (logically) and 
    5557           continue.</li> 
    5558           <li>Traverse the languageMatching data until a match is 
     5560        continue.</li> 
     5561        <li>Traverse the languageMatching data until a match is 
    55595562          found. 
    5560             <ul> 
    5561               <li>* matches any field.</li> 
    5562               <li>If the oneway flag is false, then the match is 
    5563               symmetric.</li> 
    5564             </ul> 
    5565           </li> 
    5566           <li>Add 100 minus the <strong>percent</strong> attribute 
    5567           value to MD.</li> 
    5568           <li>Remove the subtag from each (logically)</li> 
     5563          <ul> 
     5564            <li>* matches any field.</li> 
     5565            <li>If the oneway flag is false, then the match is 
     5566            symmetric<span class="changed">; otherwise only match one direction.</span></li> 
     5567            <li><span class="changed">For region matching, use the mechanisms in <strong>Section 4.4.1 <a href= 
     5568              "#EnhancedLanguageMatching">Enhanced Language 
     5569            Matching</a></strong></span>.</li> 
     5570          </ul> 
     5571        </li> 
     5572                  <li>Add <span class='changed'>the <strong>distance</strong> attribute value</span>  to MD. 
     5573                    <ul class='changed'> 
     5574                      <li>This used to be a <strong>percent</strong> attribute value, which was 100 - the distance attribute value.</li> 
     5575                </ul> 
     5576                  </li> 
     5577            <li>Remove the subtag from each (logically)</li> 
    55695578        </ol> 
    55705579      </li> 
    91449153        </ul> 
    91459154      </li> 
     9155      <li><strong>Section 4.4 <a href="#LanguageMatching">Language Matching</a>  
     9156        </strong> 
     9157        <ul> 
     9158          <li>Clarify the process  [<a href= 
     9159          "http://unicode.org/cldr/trac/ticket/11922">#11922</a>]</li> 
     9160        </ul> 
     9161      </li> 
    91469162      <li><strong>Section <a href="#Segmentation_Tests">6.4 
    91479163        Segmentation Tests</a> </strong> 
Note: See TracChangeset for help on using the changeset viewer.