Daily-Log, 2010-11-08

by Administrator. Average Reading Time: about a minute.

Over the weekend we expanded the memory on our main server. It now has 24GB and thus place for a few more virtual machines running. Take-aways: When you buy a server, buy more RAM than you think you need – you will need it. Also, while replacing memory is fairly fast, fscking half a dozen file systems take a long time…

Today I spent my day writing an interface to an “old” (like 20 or 30 year old) banking system. It takes a bunch of 700 character long strings (fixed length fields) and the encoding used is either ISO-8859-1 or US-ASCII (if doing international transaction). We do international transactions, so all Umlauts (ÜÄÖ) must be converted to either UE AE and OE or just plain to UAO… My first instinct was to use the Ruby Iconv library Iconv.iconv("US-ASCII", "UTF-8", string) but unfortunately, this causes problems…

ruby-1.9.2-p0 > s = "ümlaut"
 => "ümlaut" 
ruby-1.9.2-p0 > s.encoding
 => # 
ruby-1.9.2-p0 > s.force_encoding("ASCII")
 => "\xC3\xBCmlaut" 
ruby-1.9.2-p0 > s.encoding
 => # 
ruby-1.9.2-p0 > s.force_encoding("UTF-8")
 => "ümlaut" 
ruby-1.9.2-p0 > s.encode("ASCII")
Encoding::UndefinedConversionError: U+00FC from UTF-8 to US-ASCII
	from (irb):6:in `encode'
	from (irb):6
	from /Users/jcf/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `
' ruby-1.9.2-p0 > require 'iconv' => true ruby-1.9.2-p0 > Iconv.iconv("US-ASCII", "UTF-8", s) Iconv::IllegalSequence: "ümlaut" from (irb):9:in `iconv' from (irb):9 from /Users/jcf/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `
' ruby-1.9.2-p0 > Iconv.iconv("US-ASCII//IGNORE", "UTF-8", s) => ["mlaut"] ruby-1.9.2-p0 > Iconv.iconv("US-ASCII//IGNORE//TRANSLITERATE", "UTF-8", s) Iconv::IllegalSequence: "ümlaut" from (irb):11:in `iconv' from (irb):11 from /Users/jcf/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `
' ruby-1.9.2-p0 > Iconv.iconv("US-ASCII//IGNORE//TRANSLIT", "UTF-8", s) => ["\"umlaut"]

None of these options are any good, so I fear I’m forced to go the old gsub(/ü/, "u") route. Any other ideas on how to do that?

4 comments on ‘Daily-Log, 2010-11-08’

  1. Did you try babosa? It’s the lib powering friendly_id, and is very powerful and configurable:


  2. Administrator says:

    Thanks. It does more than we need but it pointed me to transliterate in Rails 3 which does what we need.

  3. Till says:

    I normally use the following code for that (with Rails + Ruby 1.8.7):

    “ÄÜ”.mb_chars.normalize(:d).split(//u).reject { |e| e.to_s.length > 1 }.join
    # => “AU”

    See http://unicode.org/reports/tr15/#Norm_Forms for details.

    Best wishes,

  4. Edek says:

    Till: it won’t work for polish letter “Ł” as it is not composed from 2 unicode glyphs but is a separate one (as it is non-standard and exists only in Polish).

    Similar cases could exist in other languages, so dont rely on unicode in transliterating strings.

Leave a Reply

  1. Subscribe to this comment