Update on Nov. 5, 2005: I have a followup post on the same topic. It presents the performance numbers after incorporating suggestions that came as comments.
Bewitched by all the euphoria and endorsements in the blogosphere(on Tim's Radar, News.com, Infrastructure for Web 2.0 apps, ...), I, like most programming enthusiasts, decided to go with Ruby on Rails for my current pet project. In what follows, I have tried to document some of my initial impressions -- especially the ones related to runtime performance of a few early programs.
Being new to both Ruby and Ruby on Rails, I thought of learning Ruby and Ruby and Rails as priority items. Got started with the tutorials and books available on the Web, but also ordered Agile Web Development with Rails and Programming Ruby from Amazon.com for good measure. This turned out to be a good decision as I was reaching the limits of online material when the books arrived.
After feeling comfortable with Ruby as a programming language, I decided to first write a part of my project -- a program to read web server log files, parse log entries and load them into database tables -- in Ruby using Active Records, one of the core innvations of Ruby on Rails. Lateron, to keep the program simple, I further simplified it to just print top 20 hosts, urls, referrers and User Agent strings from Combined Log Formatlogfile, sorted by frequency of occurrence.
This program consists of two Ruby source files: the main script webstat.rb takes the log filename as argument, parses each line using class LogEntry (available in file logentry.rb), and stores hosts, urls, referrers and user agent strings as keys in separate hash tables, the value being the number of times a particular entity occurs. Once the logfile is fully scanned and the hash tables are populated, the entries are sorted based on the value and then first 20 entries are displayed from each hash table.
I ran this program on a combined logfile for all accesses to www.pankaj-k.net for a specific period. Just to stress the Ruby Virtual Machine, I ensured that the file was more than 100 MB in size had more than half a million log entries. Keep in mind that I actually plan to use my final program with 10 million or more log entries. (I hope MySQL can handle that!).
On my Pentium 4, 2.93 GHz CPU, 512 MB RAM, WIndows XP box, it took 25m 47s to scan and parse the file and 1.6s to sort and display the results. You can also see the complete output. If you browse through the output, you will see that successive processing of a 4096 entries consumes more and more CPU, but only upto a limit, after which the CPU consumption drops down (reflected by decrease in processing time). This may be due to the behavior of Ruby Garbage Collector but I don't know enough about Ruby to make a good guess.
Once this was done, I wondered how will these performance numbers compare with a program written in Java. On a whim, I literally translated it to Java -- logentry.rb translated to LogEntry.java and webstat.rb to WebStat.java -- and ran the Java version against the same input file. The Java version took took 2m 3s to scan and parse the file and 0.27s to sort and display the results. Again, you can see the complete output. Notice that Java handled each chunk of 4096 entry in almost constant time.
So the Java version ran almost 12 times faster!! This is signficant. If the same ratio holds true for a Ruby on Rails web application and a Java web application then what it means is that one would need to buy 10 times more hardware to serve the same amount of load (or users). This may negate all the gains made due to faster development time with Ruby on Rails.
Of course, it would be hasty to jump at such a conclusion. In fact, I came across this blog entry that claims better performance with Ruby on Rails. Perhaps I should complete my project with Ruby on Rails, do a Java translation with Trails, a Java approximation of Ruby on Rails, and then report the performance numbers.
However, my observations on poor runtime performance by a Ruby program is not alone. Worse numbers have been reported. Comparative performance of Ruby and Java at The Computer Language Shootout Benchmarks tell a similar story. With such stellar performance at JVM level, Java app frameworks will have to do something really lousy to perform worse than Ruby on Rails.
What about other metrics -- lines of code and memory use? The Ruby version is around 90 lines whereas the Java version is 186. The Ruby program used up around 20MB of RAM (as reported by Task manager) whereas the Java version used up more than 60MB.