MATSim Benchmark Results

The benchmark contains parts of the code running in parallel (the replanning part, using 4 threads) and other parts running single-threaded.

Speed comparison

The benchmark was run on some of our production servers with different versions of Java Virtual Machines. The servers have 2 Single-Core "AMD Opteron Processor 248" running at 2.2 GHz (date of purchase: fall 2004, so they have to be considered as "old").
The Java Virtual Machines tested were:

  • IBM Java 5 64bit: The default Java 5 version on the servers. The JVM identified itself as "J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423ifx-20080811"
  • Sun 6u5 64bit: The default Java 6 version on the servers, identifying itself as "1.6.0_05; Sun Microsystems Inc.; mixed mode; 64-bit"
  • Sun 5u19 64bit: The latest (as of writing this text) Java 5 version from Sun, Java 5 update 19 64-bit.
  • Sun 5u19 32bit: The latest Java 5 version from Sun, Java 5 update 19 32-bit.
  • Sun 6u14 64bit: The latest Java 6 version from Sun, Java 6 update 14 64-bit.
  • Sun 6u14 32bit: The latest Java 6 version from Sun, Java 6 update 14 32-bit.
  • Sun 6u14 64bit COP: Sun's Java 6 update 14 64-bit, started with the extra argument "-XX:+UseCompressedOops". Compressed Object Pointers should "improve performance of the 64-bit JRE when the Java object heap is less than 32 gigabytes in size" (see Java SE 6 Update 14 Release Notes). As an additional advantage, memory consumption should also be a bit lower when using only 32bits for object pointers.
  • Sun 6u14 64bit COP + AO: Sun's Java 6 update 14 64-bit, started with the extra argument "-XX:+UseCompressedOops -XX:AggressiveOpts". In addition to using Compressed Object Pointers, also try out the new "experimental implementation of java.util.TreeMap that can improve the performance" as MATSim makes heavy use of TreeMaps (although not necessarily that often iterating over them)
  • Sun 6u14 32bit AO: Sun's Java 6 update 14 32-bit, started with the extra argument "-XX:AggressiveOpts". Just for comparison, also start the 32-bit version of the JVM with the aggressive optimization option.

benchmark performance results 1

The shown number in above graph are the average of two runs of the benchmark for each configuration (Yeah, two isn't that big a sample, we know... but it should still be valid to demonstrate some findings).
What can be observed is the huge difference of execution time in general between 32-bit and 64-bit versions of the virtual machines. The "compressed object pointers" (COP) feature of Suns JVM 6u14 seems to compensate for this nicely, making the 64-bit version about the same speed than the 32-bit version. The aggressive optimization options (AO) on the other hand doesn't seem to influence the performance of MATSim drastically.
While the difference between Suns Java 5 and Java 6 versions in the total execution time seem more or less random, there seems to be a performance improvement for the multithreaded replanning part by changing from Java 5 to Java 6. Interestingly, despite IBM's worse multithreaded performance, it is able to catch up in the single-threaded parts to come in with a similar total execution time than Sun's 64-bit JVMs.

The benchmark was also run on other server machines:

  • Servers with two Dual-Core AMD Opteron 2222 processors, running at 3.0 GHz; date of purchase winter 2007/2008 (very similar architecture than the previously tested AMD Opteron 248 systems, just "newer" with higher clocked CPU)
  • Servers with Intel Xeon X5355 Quad-Core processors, clocked at 2.66 GHz, 8 MB L2 cache, built with 65 nm technology
  • Servers with Intel Xeon E5430 Quad-Core processors, clocked at 2.66 GHz, 12 MB L2 cache, built with 45 nm technology
  • Servers with Intel Xeon E5530 Quad-Core processors, clocked at 2.4 GHz, 8 MB L3 cache, 45 nm technology, Nehalem architecture
  • Servers with Intel Xeon E7540 Hex-Core processors, clocked at 2.0 GHz, 18 MB L3 cache, 45 nm technology, Nehalem architecture, DDR3 RAM

benchmark speed results 2

Comparing the AMD servers running at 2.2 GHz and 3.0 GHz, the most obvious difference is the time for the replanning, explainable by the different number of cores the machines have. Interestingly, the remaining execution time didn't really improve by the change in CPU speed, leading to the guess that the performance of the memory controller or the memory bus is limiting the speed of MATSim (both AMD servers seem to have a front side bus of 1000 MHz).

The Intel servers were massively faster than the AMD machines. We do not yet know if it's the different memory controller, faster memory bus, or if Sun's JDK is just more optimized for Intel processors. Anyway, the difference is striking. And each newer generation of Intel processors seems to deliver a real performance upgrade, even when running at a lower clock-speed.

At last, the benchmark was also run on some of our laptop machines: An Apple MacBook Pro with a Intel Core 2 Duo processor clocked at 2.33 GHz (model from fall 2006), running Mac OS X 10.5.7, and an IBM/Lenovo Laptop with the same Intel Core 2 Duo processor, 2.33 GHz, running a Gentoo 64-bit Linux (Kernel 2.6.28). Both laptops have a front side bus of 667 MHz, so from a technical view point they are very similar.

On the Apple MacBook Pro, the following Java Virtual Machines were used:

  • Apple JVM 5u16 32bit: The default Java 5 version on Mac OS provided by Apple.
  • Apple JVM 6u7 64bit: The default Java 6 version on Mac OS provided by Apple.
  • Soylatte 6u3 32bit: An early port of OpenJDK 6, identifying itself as "1.6.0_03-p3; Sun Microsystems Inc.; mixed mode; 32-bit", provided by Landon Fuller.

On the Lenovo Laptop, an OpenJDK 6u0 64-bit JVM was used.

benchmark speed results 3

 

Surprisingly, Apple pulled the trick to make their 64-bit Java 6 a lot faster than the older 32-bit Java 5—well, it could also mean that their Java 5 offering is just very slow... The Mac-port of OpenJDK 6 ("Soylatte") is even a bit faster, but that may be likely due to the difference between 64-bit and 32-bit.

Are you able to run the MATSim benchmark even faster? Please tell us so! We're very interested in your benchmark results.

Memory usage comparison

MATSim writes out information about memory usage from time to time into the logfile. Plotting this information gives a jagged line running from left to right. Heights and lows in the plot can be explained with the Java Garbage Collector, only freeing up the memory from time to time. Still, one can guess the absolute minimum of memory required by MATSim by looking at the lower parts of the curve (that's then when a Garbage Collection just ran, showing all the memory that could not be collected).

Comparing the memory consumption in Sun's (currently) latest Java VM version holds no real surprises.

benchmark memory results

It can be clearly seen that the 64-bit JVM uses the largest amount of memory, due to the fact that each object pointer takes up 8 bytes. The 32-bit JVM uses the least memory. The 64-bit JVM with compressed object pointers seems to lie somewhere in between—although I would have expected it to be comparable to the 32-bit JVM, it seems that it still uses a bit more memory for unknown reasons. Anyway, it comes in handy to know that one can load now larger scenarios on a 32GB (or less) machine. The memory savings, compared to a 64-bit JVM without compressed object pointers, should be even bigger the larger the scenario is or the more details the simulated network has, so this feature really looks promising.