Daily-Log, 2010-11-08
by Administrator. Average Reading Time: about a minute.
Over the weekend we expanded the memory on our main server. It now has 24GB and thus place for a few more virtual machines running. Take-aways: When you buy a server, buy more RAM than you think you need – you will need it. Also, while replacing memory is fairly fast, fscking half a dozen file systems take a long time…
Today I spent my day writing an interface to an “old” (like 20 or 30 year old) banking system. It takes a bunch of 700 character long strings (fixed length fields) and the encoding used is either ISO-8859-1 or US-ASCII (if doing international transaction). We do international transactions, so all Umlauts (ÜÄÖ) must be converted to either UE AE and OE or just plain to UAO… My first instinct was to use the Ruby Iconv library Iconv.iconv("US-ASCII", "UTF-8", string) but unfortunately, this causes problems…
ruby-1.9.2-p0 > s = "ümlaut"
=> "ümlaut"
ruby-1.9.2-p0 > s.encoding
=> #
None of these options are any good, so I fear I’m forced to go the old gsub(/ü/, "u") route. Any other ideas on how to do that?

Did you try babosa? It’s the lib powering friendly_id, and is very powerful and configurable:
https://github.com/norman/babosa/
http://rubydoc.info/gems/babosa/0.2.0/frames
Thanks. It does more than we need but it pointed me to transliterate in Rails 3 which does what we need.
I normally use the following code for that (with Rails + Ruby 1.8.7):
“ÄÜ”.mb_chars.normalize(:d).split(//u).reject { |e| e.to_s.length > 1 }.join
# => “AU”
See http://unicode.org/reports/tr15/#Norm_Forms for details.
Best wishes,
Till
Till: it won’t work for polish letter “Ł” as it is not composed from 2 unicode glyphs but is a separate one (as it is non-standard and exists only in Polish).
Similar cases could exist in other languages, so dont rely on unicode in transliterating strings.