I'm working on extending the case conversion methods for the programming
language Ruby from the current ASCII only to cover all of Unicode.
Ruby comes with four methods for case conversion. Three of them, upcase,
downcase, and capitalize, are quite clear. But we have hit a question
for the forth method, swapcase.
What swapcase does is swap upper and lower case, so that e.g.
'Unicode Standard'.swapcase => 'uNICODE sTANDARD'
I'm not sure myself where this method is actually used, but it also
exists in Python (and maybe Ruby got it from there).
Now the question I have is: What to do for titlecase characters? Several
possibilities already have been floated:
a) Leave as is, because there are neither upper nor lower case.
b) Convert to upper (or lower), which may simplify implementation.
c) Decompose the character into upper and lower case components, and
apply swapcase to these.
For example, 'Džinsi' (jeans) would become 'DžINSI' with a), 'DŽINSI' (or
'džinsi') with b), and 'dŽINSI' with c). For another example, 'ᾨδή' would
become 'ᾨΔΉ' with a), 'ὨΙΔΉ' (or 'ᾠΔΉ') with b), and 'ὠΙΔΉ' with c).
It looks like Python 3 (3.4.3 in my case) is doing a). My guess is that
from an user expectation point of view, c) is best, so I'm tending to go
for c). There is no existing data from the Unicode Standard for this,
but it seems pretty straightforward.
But before I just implement something, I'd appreciate additional input,
in particular from users closer to the affected language communities.
Regards, Martin.
Received on Fri Mar 18 2016 - 02:44:30 CDT
This archive was generated by hypermail 2.2.0 : Fri Mar 18 2016 - 02:44:30 CDT