From: John Burger (john@mitre.org)
Date: Fri Mar 13 2009 - 14:00:39 CST
Clark S. Cox III wrote:
>> In UTF8, the latter two are exactly the same size, 17 bytes, so
>> tinyarro doesn't save you any space in Twitter, say.
>
> Yes it does. Twitter doesn't count bytes, it counts characters. The
> '➡' counts as a single character towards the 140 character limit:
>
> <http://twitter.com/clarkcox/status/1323106411>
Cool! This may have changed recently, see this pronouncement from the
development team in January:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/44be91d5ec5850fa
You tweeted using the web interface, which apparently worked, but it's
certainly the case that there are other Twitter clients that don't
know about UTF8 in particular, and (perhaps unnecessarily) truncate
after 140 =bytes=. Some SMS services (whence Twitter got it's 140-
whatever limit) transfer non-ASCII in UTF-16, so the limit there is 70
Unicode characters.
On a related note, there are also apparently some bugs in the way the
Twitter backend stores text, such that sometimes tweets get truncated
after the fact, as the data migrates deeper into their backing store:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/9d9d16d55e2e1e67
You may want to check that tweet in a few days time to see if the
arrow is still there.
Isn't i18n fun?
- John D. Burger
MITRE
This archive was generated by hypermail 2.1.5 : Fri Mar 13 2009 - 14:02:30 CST