Performance Archives

June 18, 2004

Java Faster Than C++? -- My favorites from Slashdot discussion

Fascinating discussion on Java and C++ performance (if you can just skip the noise!) at Slashdot. I especially liked the following entries:

  1. Measurements are for compilers and/or JVMs, not languages: Yes, different JVMs have different performance characteristics and different compilers do different level of optimizations. Though, with time, they tend to converge.
  2. Java facilitates optimization at the macro level: Low level languages such as Assembly and C support runtime efficiency at machine instruction level whereas VM based languages like Java (and C#!) support optimization by doing certain tasks (such as GC) when the system is less loaded. Potential of dynamic optimization based on runtime analysis also favors VM based languages.
  3. Given sufficient will (and time), it is possible to optimize C++ code run faster than Java: Includes optimized code for hash table lookups and the measurement times.
  4. Which language is faster -- depends on workload: and also how your program handles that workload.
  5. Inlining in Java: JVM indeed has more information to do better inlining and hence reduce the method call overhead.
  6. Optimize your Java App: Don't rely on JVM for all the optimizations. Application architecture and use of efficient libraries are important for Java as well.

June 27, 2004

Who (or What) is slowing down my Java App Startup?

Few weeks ago I got an email message from my colleague Craig Bryant asking about potential fix for an annoying pause during startup of his Java program that did some crypto during initialization. He also attached his code, pin-pointing the problem to the first invocation of the call Cipher.getInstance("DES/CBC/PKCS5Padding"), which took more than 5 seconds to execute on an HPUX box with JDK1.4.1. Agreed that 5 seconds delay at startup is not a big deal, but is still unacceptable in most development and production systems.

I compiled and executed the program on my old and rusty W2K desktop (a Pentium 350MHz box), and confirmed the delay. I also found that the delay was cut by almost half by just switching from J2SDK1.4.1 to J2SDK1.4.2. Not surprising given the impressive optimizations reported in J2SE 1.4.2. Running the program on a more modern machine (say, a 2.0GHz box) further reduced the delay to a more acceptable sub-second range. I attributed this to slow-but-necessary security intialization overhead and forgot about it.

Craig later reported that he profiled the code and found that most of the delay was due to signature verification. This made sense. Invocation of Cipher.getInstance("DES/CBC/PKCS5Padding")would require execution of classes in a JCE provider and the JCE engine requires the provider to be signed by a certificate issued JavaSoft. It appears silly, at least for a standalone Java program running on a trusted machine, to verify the JCE provider jar every time the program starts.

Lateron, after reading the recommendation to run the server JVM for better performance (which made sense anyway, as the original program was to run as a server program), I tried the same program with java -server, only to find that startup delay was worse than without -server option, with a factor of 3.5! (a 0.9 sec. delay became a 3.2 sec. delay). Again, an explanation is not hard to come by: -server option causes the JVM to compile all the code used in verifying the signed jar to native code, even if the verification takes place only once. And because of this, the performance gain of running native machine instructions do not offset the time spent in doing the byte code to native code translation.

Although the previous story is in the context of crypto operations, the general observation that signed jar files introduce human perceptible startup delays, applies to many more situations. This delay is proportional to the size of the signed jar and not the size of classes that get used from this jar file. For example, the following program:

import lib.HelloWorldLib;
public class HelloWorld {
public static void main(String[] args) throws Exception {
// File: lib\
package lib;
public class HelloWorldLib {
public static void helloWorld() throws Exception {
System.out.println("Hello, World!");

takes around 280 milli seconds to execute when class lib.HelloWorldLib is packaged within a jar file. However, the execution time jumps to 700 milli seconds when this jar file is signed. Adding this class to a large existing jar file such as rt.jar (which is a 25MB monster!) and signing the resulting jar makes the execution time to be more than 7 seconds. Note: All measurements reported in this paragraph were made on a AMD Athlon 900MHz box running W2K, J2SE 1.4.2 and server JVM. The measurements of first few executions were discarded to allow for OS and filesystem cache warmup.

This is something to keep mind if you plan to deliver signed jar files!

August 28, 2005

BeanShell, Rhino and Java -- A Performance Comparison

Obsessing about micro benchmarks may not be a very good software development strategy, but I like to write small programs that tell me about relative cost of doing the same thing in different ways very insightful. It is also a good way to motivate myself learn about new technologies -- afterall, what would be a better way to learn than by doing analogies and comparisons.

So, now that I am evaluating the suitability of my old and trusted BeanShell for future experimentatal projects (more about it in a future post), it was only natural to do some benchmarking. And that is what I did. I wrote Bubble Sort programs in BeanShell, JavaScript(Rhino) and Java, and ran them under different environments and documented the results.

August 31, 2005

Adding Python and Jython to the Performance Shootout

Looks like performance comparisons are very popular among the development folks. My last post about relative performance of BeanShell, JavaScript (Rhino) and Java was listed as picks of the week in JavaLobby's weekly Newsletter and is attracting fair amount of traffic.

One of the readers, Jim Adrig, sent me the jython version of the test program. BTW, Jyhton was also suggested by one of the commenters.

So, I took the suggestion, installed Jython and have updated the article with the observed performance figures.

October 12, 2005

Adding Groovy to the Bubble Sort Performance Race

Few weeks ago I published a runtime performance comparison article comparing performance of bubble sort programs under Beanshell, JavaScript (Rhino) and Java. Lateron, based on a program contributed by Jim Adrig, I added Jython and Python programs as well.

Although I had received a few requests to add Groovy in the mix, no one actually came forward with a working program. However, this changed yesterday and I got a groovy bsort program from Graeme Sutherland. I ran it on my box under the same conditions as other programs and am reporting the performance figures in the following updated table:

No. of stringsnum: 1000num: 5000num: 10000
bsh/bsort (BeanShell, interpreted)68815500251540200047801623000
js/bsort1 (intepreted)1404503441105060046330
js/bsort1 (compiled)1404403441094058046020
js/bsort2 (intepreted)44016601360415802516172000
js/bsort2 (compiled)44016251345403902470168000
java/bsort (java, compiled)32476310931094625
jython/bsort (jython-2.1)172328391886068844360
python/bsort (python-2.4.1)20437010209200204037100
python/bsort (jython-2.1)51530015008400280043400
groovy-1.0-jsr-03/bsort (interpreted)30048008001180001345469000
groovy-1.0-jsr-03/bsort (compiled)30047008401170001400465000

As you can see, Groovy performs better than Beanshell but falls behind Jython and JavaScript(Rhino). Interestingly, the compiled version performed only marginally better than the interpreted one. The same was true for JavaScript as well.

As Graeme wasn't confident that his code is optimal for Groovy, I am not updating my article yet. Perhaps Groovy fans can take a look and let me know what they think about it.

November 1, 2005

Ruby or Java -- A (Performance) Reality Check

Update on Nov. 5, 2005: I have a followup post on the same topic. It presents the performance numbers after incorporating suggestions that came as comments.

Bewitched by all the euphoria and endorsements in the blogosphere(on Tim's Radar,, Infrastructure for Web 2.0 apps, ...), I, like most programming enthusiasts, decided to go with Ruby on Rails for my current pet project. In what follows, I have tried to document some of my initial impressions -- especially the ones related to runtime performance of a few early programs.

Being new to both Ruby and Ruby on Rails, I thought of learning Ruby and Ruby and Rails as priority items. Got started with the tutorials and books available on the Web, but also ordered Agile Web Development with Rails and Programming Ruby from for good measure. This turned out to be a good decision as I was reaching the limits of online material when the books arrived.

After feeling comfortable with Ruby as a programming language, I decided to first write a part of my project -- a program to read web server log files, parse log entries and load them into database tables -- in Ruby using Active Records, one of the core innvations of Ruby on Rails. Lateron, to keep the program simple, I further simplified it to just print top 20 hosts, urls, referrers and User Agent strings from Combined Log Formatlogfile, sorted by frequency of occurrence.

This program consists of two Ruby source files: the main script webstat.rb takes the log filename as argument, parses each line using class LogEntry (available in file logentry.rb), and stores hosts, urls, referrers and user agent strings as keys in separate hash tables, the value being the number of times a particular entity occurs. Once the logfile is fully scanned and the hash tables are populated, the entries are sorted based on the value and then first 20 entries are displayed from each hash table.

I ran this program on a combined logfile for all accesses to for a specific period. Just to stress the Ruby Virtual Machine, I ensured that the file was more than 100 MB in size had more than half a million log entries. Keep in mind that I actually plan to use my final program with 10 million or more log entries. (I hope MySQL can handle that!).

On my Pentium 4, 2.93 GHz CPU, 512 MB RAM, WIndows XP box, it took 25m 47s to scan and parse the file and 1.6s to sort and display the results. You can also see the complete output. If you browse through the output, you will see that successive processing of a 4096 entries consumes more and more CPU, but only upto a limit, after which the CPU consumption drops down (reflected by decrease in processing time). This may be due to the behavior of Ruby Garbage Collector but I don't know enough about Ruby to make a good guess.

Once this was done, I wondered how will these performance numbers compare with a program written in Java. On a whim, I literally translated it to Java -- logentry.rb translated to and webstat.rb to -- and ran the Java version against the same input file. The Java version took took 2m 3s to scan and parse the file and 0.27s to sort and display the results. Again, you can see the complete output. Notice that Java handled each chunk of 4096 entry in almost constant time.

So the Java version ran almost 12 times faster!! This is signficant. If the same ratio holds true for a Ruby on Rails web application and a Java web application then what it means is that one would need to buy 10 times more hardware to serve the same amount of load (or users). This may negate all the gains made due to faster development time with Ruby on Rails.

Of course, it would be hasty to jump at such a conclusion. In fact, I came across this blog entry that claims better performance with Ruby on Rails. Perhaps I should complete my project with Ruby on Rails, do a Java translation with Trails, a Java approximation of Ruby on Rails, and then report the performance numbers.

However, my observations on poor runtime performance by a Ruby program is not alone. Worse numbers have been reported. Comparative performance of Ruby and Java at The Computer Language Shootout Benchmarks tell a similar story. With such stellar performance at JVM level, Java app frameworks will have to do something really lousy to perform worse than Ruby on Rails.

What about other metrics -- lines of code and memory use? The Ruby version is around 90 lines whereas the Java version is 186. The Ruby program used up around 20MB of RAM (as reported by Task manager) whereas the Java version used up more than 60MB.

November 4, 2005

Revised: Ruby or Java -- A (Performance) Reality Check

In an earlier post I reported 12 times slower performance for the Ruby version of a web server log parsing program compared to the Java version. A couple of readers pointed out that this was probably due to an inefficient implementation of method DateTime.strptime, which I used in LogEntry class to convert the date field into a Ruby DateTime object.

To confirm this I ran the original program under profiler ("ruby -r profile ...", BTW, I wasn't aware of this facility when I wrote my last post. Keep in mind that I am a Ruby newbie!) and found that method DateTime.strptime was indeed eating up a fairly signficant percentage of CPU cycles. Though it isn't very obvious from the profiler output (the linked output is only for 1000 log entries as the code runs signficantly slower while profiling).

It turns out this particular method completely whacked the figures of my last post. But for this method, Ruby version of the program would do quite well, in fact, better than my expectations. Read on for details.

Continue reading "Revised: Ruby or Java -- A (Performance) Reality Check" »

November 8, 2006

lmbench results for Amazon EC2

I recently downloaded and ran lmbench-2.5 on an Amazon Elastic Computing Cloud (EC2) instance. As you might know, lmbench is a fairly old microbenchmark for low level OS functions such as context switch, process creation, networking operations and so on. Amazon EC2 is an innovative service offering from the famous online retailer that allows you to run an OS within a XEN virtualization environment, guaranteeing the equivalent of a system with a 1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth.

Based on the the above specification, I had expected its benchmark results to be in line with the ones reported in xen-devel mailing list post. However, lmbench results for Amazon EC2 I recorded were very different. You could study the two results yourself and draw your own conclusions. Or just look at my summary of the similarities and differences:

  1. Results reported in xen-devel mailing list used a dual processor 2.4GHz Xeon box with SMP Xen. It used lmbench-3.0 alpha with Linux kernel on Xen version 1.1301. The underlying h/w for Amazon EC2 is not known, but a listing of /proc/cpuinfo shows it to be 2405.450 MHz AMD Opteron(tm) Processor 250. I used lmbench-2.5 for Amazon EC2. Both machines seem to have same clock speed. Other differences, including presence of 2 CPUs and differing versions of Linux kernel, lmbench and Xen, may or may not be material to the lmbench benchmark numbers.

  2. xen-devel mailing list post reports TCP connect time to be 53 micro secs, the same metric for Amazon EC2 is 345 micro sec. Amazon EC2 numbers are 3 to 6 times slower for most other operations as well.

How can this be explained? I would offer multiple possibilities:

  1. The underlying hardware reserved by Amazon EC2 is much less powerful than a 2.4 Xeon CPU (Assuming that lmbench benchamrks utilize only one CPU, even when 2 are present). Unlikely, IMHO.

  2. Amazon EC2 has not optimized Xen for its hardware. Most likely this is the case.

  3. Version differences are main culprit. Unlikely.

What would you say?

Note: I have comments disabled due to heavy comment spam. If you do have something interesting to say then send me an email at and I will post your message.

Update: I upgraded the blogging software and it seems to have good comment spam filter, so am enabling comments on this entry.

About Performance

This page contains an archive of all entries posted to Pankaj Kumar's Weblog in the Performance category. They are listed from oldest to newest.

Linux is the previous category.

Perl is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33