« Ruby or Java -- A (Performance) Reality Check | Main | Can you spot the usability bug in this google screen? »

Revised: Ruby or Java -- A (Performance) Reality Check

In an earlier post I reported 12 times slower performance for the Ruby version of a web server log parsing program compared to the Java version. A couple of readers pointed out that this was probably due to an inefficient implementation of method DateTime.strptime, which I used in LogEntry class to convert the date field into a Ruby DateTime object.

To confirm this I ran the original program under profiler ("ruby -r profile ...", BTW, I wasn't aware of this facility when I wrote my last post. Keep in mind that I am a Ruby newbie!) and found that method DateTime.strptime was indeed eating up a fairly signficant percentage of CPU cycles. Though it isn't very obvious from the profiler output (the linked output is only for 1000 log entries as the code runs signficantly slower while profiling).

It turns out this particular method completely whacked the figures of my last post. But for this method, Ruby version of the program would do quite well, in fact, better than my expectations. Read on for details.

Although I didn't use the date field in my webstat.rb program of last post, I do need this field for the next version of the program which will insert the fields of log entries into database tables using ActiveRecord. As I want the datatype for log_entry_date column to be 'datetime', I must have the corresponding field to be of Ruby type Time as per the Ruby to SQL type mapping defined by ActiveRecord.

The built-in Time class has a number of functions to convert date strings into Time object, but none of these work for date strings used by Combined Log Format (CLF) log entries (example string : "10/Oct/2000:13:55:36 -0700"). So I ended up writing my own extension, patterned after the code from library functions in file time.rb:


require 'time'
# extend built-in Time to make use of private functions and variables
class Time
class << Time
def log_zone_offset(zone)
off = 0
if /\A([+-])(\d\d)(\d\d)\z/ =~ zone
off = ($1 == '-' ? -1 : 1) * ($2.to_i * 60 + $3.to_i) * 60
end
off
end
def logdate(date)
if /\A(\d{2})\/(\w{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2}) ([+-]\d{4})\z/ =~ date
day = $1.to_i
mon = MonthValue[$2.upcase]
year = $3.to_i
hour = $4.to_i
min = $5.to_i
sec = $6.to_i
usec = 0
zone = $7
year, mon, day, hour, min, sec =
apply_offset(year, mon, day, hour, min, sec, log_zone_offset(zone))
Time.utc(year, mon, day, hour, min, sec, usec)
else
raise ArgumentError.new("invalid date: #{date.inspect}")
end
end
end
end
]]>

It did feel a bit strange to inject my own method in a built-in class but this is the only way I could use the existing private methods such as apply_offset(). Alternatively, I could have done a "copy-paste" of the relevant portions of code, creating my own custom CLFTime class or created a date string in a format understood by the the Time class. The later approach would perhaps have resulted in less code but may have added to overall processing time!

Don't know how common is this practice of injecting code in an existing class among Ruby programs!

The result of this change was startling -- the execution time for parsing and populating the hash tables dropped from 25m 47s to under 4m, an improvement of more than 6 times. The corresponding figure for the Java version is 2m 3s. Java version still runs faster but the gap is fairly small, especially if you consider the fact that the J2SE5 JVM has undergone many years of optimization whereas Ruby's YARV is still under development. Another thing worth noting is that Ruby version was much more efficient in memory utilization -- the maximum program size reported by Task Manager was 22MB for Ruby Version, whereas it was 62MB for the Java version.

Although illuminating, these results should not be treated as a full comparison of Ruby and Java as programming languages or even runtime efficiency of their specific implementations on a particular platform. They should just be taken as what they are: execution time for specific programs acheiving the same result but written in these languages and run under specific implementations (Sun'w J2SE1.5 JVM and Ruby-1.8.3) on a specific machine and OS (Windows XP).

At the same time, these numbers are better indicative of perforamnce in real situations than most microbenchmarks as they take advantage of available built-in (and hence optimized) functions under both environments.

Thanks to the Ruby enthusiasts who pointed me in the right direction. Now I can happily develop my pet project in Ruby, without harboring any misgivings about performance issues.

Comments (2)

aaa:

the whole code, especially that regex sucks big time.

Extending built in classes is a pretty common practice in Ruby (inherited from Smalltalk). Of course, some care and common sense should be taken.

And ignore Mr. Aaa, that's a pretty optimal implementation. Regex is a native library in Ruby and probably the reason why the difference to Java is so small. Ruby is significantly slower than Java but the Regex implementation in Java is not native.

About

This page contains a single entry from the blog posted on November 4, 2005 11:16 PM.

The previous post in this blog was Ruby or Java -- A (Performance) Reality Check.

The next post in this blog is Can you spot the usability bug in this google screen?.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33