Anthony Patricio’s Blog

Decrypting another JPA benchmark

Posted in Persistance / Données by apatricio on octobre 23, 2012
Reviewing a benchmark is always an interesting task. Unfortunately it is often useless because benchmarks are rarely a good representation of real life conditions.
Last month, someone told me a new JPA implementation was trying to make noise. The author was making some buzz almost everywhere on social networks, commenting on popular articles and writing 2 articles claiming that this implementation was over 15 times faster than Hibernate. And here we are again. This sort of buzz happens every 2 years, it’s a cycle, there is nothing we can do about that.

The benchmark

Reading more about the implementation, the focus is made on fine grained tuning on every single line of code which is a really brave approach. However the way to announce « The New JPA Implementation That Runs Over 15 Times Faster… » is not a good way to promote the implementation. Anyway let’s have a closer look at the benchmark that allowed the author to conclude « 15 times faster ».
The benchmark is essentially composed of massive inserts, updates, deletes, queries by packet of 250 iterations. The domain model is composed of
  • 4 entities, yes 4 entities,
  • no inheritance, it sounds weird that a fabulous JPA implementation is not including more mapping configurations in its benchmark
  • eager fetching enabled (no subselect fetching for example),
  • cascading enabled,
  • no versioning

Last but not least, the benchmark targets an in-memory database, yes this is where it becomes suspicious.

15 times faster… faster what?

The big problem behind the title is that faster is not at all related to response time. The author tries to explain it, it’s related to CPU load, fair enough but could he clearly explain the consequences on a full system?
The fact is that the results are really hard to read and there is no relative view of the benchmark. You are told that Batoo might take 15 times less CPU time than Hibernate and what? The danger here is to mix internal APIs CPU load with the full persistence cycle composed of JPA internals, JDBC driver, networking, database engine. We all know, at least I hope so, that most of the time is spent on network and database so is this really important to focus _that much_ on JPA internals? Is hibernate that slow? Will Batoo solve your performance issues?
Let’s be extremely clear here, most of the performance issues users are facing are NOT because of JPA internals CPU load, the problems are almost always related to a slow database, too fat entity managers, inefficient queries. There are 3 general ways to solve these issues:
  • check your database !!! JPA is NOT going to fix a database internal problem
  • make a better use of the database and JPA
  • tune your own code, that requires various features and that is going to have a really big impact on the global performance numbers. In order to do so, the frameworks you are using must offer various features, this is exactly what Hibernate does: allows you to tune every single interaction your are triggering with the database.

Impacts: the Global view

I’m not going to rely on the timers provided by the benchmark. People are interested in checking how a system globally reacts when switching from one software option to another. I asked my friend and mate what tool a sysadmin would use to have a good view of the resources consumed by a process. « Easy » he told me, use the time command.
The time command runs the specified program command with the given arguments. When command finishes, time writes a message to standard output giving timing statistics about this program run. These statistics consist of (i) the elapsed real time between invocation and termination, (ii) the user CPU time (the sum of the tms_utime and tms_cutime values in a struct tms as returned by times(2)), and (iii) the system CPU time (the sum of the tms_stime and tms_cstime values in a struct tmsas returned by times(2)).
So I’m going to use time mvn test with the good format to grab the CPU numbers.
Some would say it’s not accurate. I would answer that it allows to measure a real relative impact on a global system not at the thread level.
We’ll see how much CPU% are consumed by the benchmark.
The good thing with mvn test is that we’ll also get the total time execution.

Well used Hibernate is ways faster than noob Hibernate 

In order to do some tuning, I’ll start with a low BENCHMARK_LENGTH iteration number, let’s say 5. That is 5 times each sub test, one subtest globally doing 250 iterations.
Since the in-memory db will be started by the root process, time command will also catch the DB load which we don’t want but we are just tuning the hibernate config for the moment.

Shoot 1: In memory derby, BENCHMARK_LENGTH 5

Hibernate 13 seconds 200% CPU
Batoo 9 seconds 210% CPU
Hibernate based setup _globally_ consumes less CPU but is also slower. Wait, I’ve said that this measure was including the database engine load. Having in mind that Batoo is claiming it consumes 15 times less CPU than hibernate what would that mean? That would mean that the JPA internal  CPU load is extremely low compared to the in-memory database CPU load … and still, we have an unsolvable equation here, we need more tests. Anyway, I already think know that CPU time consumed by JPA may not be a critical issue at all. But wait, small benchmark size and in memory database? This is irrelevant.
Let’s move to a local MySql instance, in-memory databases might be good for development, not for production. If I want a memory based config, I’ll got for Hibernate + Infinispan + a real database instance, not in-memory.
 

Shoot 2: Local MySql, BENCHMARK_LENGTH 10

Hibernate 44 seconds 64% CPU
Batoo 36 seconds 55% CPU
Batoo is the winner. Should we stop here? Certainly not.
 
Something is wrong. The good thing with hibernate, a _mature_ JPA implementation, is that it allows tons of tuning, being in the mapping, in the queries well everywhere. 
I remember 7 years ago, one guy at work was insulting Hibernate saying it was a piece a crap because he was observing memory issues and/or extremely long response times. One quick look at the hibernate documentation, 2 mapping parameter updates, more care of the entity manager (well Hibernate Session at that time), allowed to solve the memory issue and allowed a 10 000 faster execution. I promise it’s true, 10 000 times faster in response time.
A quick look at the documentation informs us that we should use the following global config parameters:
<property name="hibernate.jdbc.batch_size" value="50"/>
<property name="hibernate.order_update" value="true"/>
<property name="hibernate.id.new_generator_mappings" value="true"/>
More important: checking the INFO level logs, you’ll notice that the author of the benchmark left the auto commit mode. For someone who claims to be implementing the fastest JPA implementation ever, he should review the basics.
So let’s add
<property name="hibernate.connection.autocommit" value="false" />
and rerun the test with something bigger, I also raise the BENCHMARK_LENGTH to 100. 

Shoot 3: Local MySql, BENCHMARK_LENGTH 100

Hibernate 5 min 1 seconds 29% CPU
Batoo 4 min 52 seconds 22% CPU
 
29% versus 22% CPU… We are talking about a production server. In my case I used a 8 core CPUs 3 years old server. 
29% of one CPU … 
This benchmark is using 
  • 3.625% of my global System CPU capacity with hibernate
  • 2.75% of my global System CPU capacity with Batoo
It seems we are really far from the 15 times faster!

A more realistic environment

I won’t stop here, the author says that his motivation is to reduce cluster size thanks to his optimized implementation. So let’s say I don’t have 8 core, only 2. Here I might be interested in reducing my CPU load.

Who says cluster, also says remote database. And even without a cluster, production system will host the database in a different server.
So I set up a really fast LAN, excellent ping and bandwidth.
 
Here are the new results

Shoot 4: Remote MySql, BENCHMARK_LENGTH 100

Hibernate 16 min 30 seconds 6% CPU
Batoo 17 min 55 seconds 5% CPU
(Is it me or Hibernate is faster ?)
Which, for a 4 core CPUs system means
  • 1.5% of my global System CPU capacity with hibernate
  • 1.25% of my global System CPU capacity with Batoo
or for a 8 core CPUs system means
  • 0.75% of my global System CPU capacity with hibernate
  • 0.62% of my global System CPU capacity with Batoo
Yes Batoo is better in terms of CPU load but with such low levels, who cares? The persistence engine is always waiting for the database!
In the middleware side we have a very low CPU load (at least for the JPA part), what about the DB server side? We have one CPU over the 4 constantly above 80%. A second one above 40%.
 

Putting all of this together, Batoo MAY consume slightly less CPUthan Hibernate but the actual database load on these benchmarks far outweighs the load imposed by either Batoo or Hibernate.

And, since the internal ORM performance is a relatively minor issue, we should think about the products’ features.
Why is that important?
  • write stupid queries and you’ll get network, database and possibly memory issues (no matter if you use Hibernate or Batoo) -> query oriented advanced features are welcome
  • be idiot using the entity manager and you’ll observe slowness and memory issues (no matter if you use Hibernate or Batoo) -> get some training and certification, persistence is not something easy, you need SKILLS 
  • not talking about the DB slowness … if you do not understand RDBMS, you MUST work with a DBA, this is not an advice, this is a requirement.
  • if you are an expert in JPA, and your application is slow because of the database, you may have hit the RDBMS limitations. Give a try to Hibernate OGM with a NoSql supported database.

Shoot 5: Remote MySql, BENCHMARK_LENGTH 100, 2 concurrent run

I’m not going to provide the results but response times are impacted by 50%. CPU load is close to 2%

Conclusion: I don’t care these CPU optimizations I prefer FEATURES!

I honestly cannot imagine JPA being a cause of CPU problem in a real life scenarios. I’m also not forgetting that Hibernate is a 12 years old project. Talented people are working full time on it, the major version is 4, that means it has been heavily totally reworked 4 times. So that’s enough to understand that it is well written and that efforts are put in the most critical area: offering features.

I did online community support then official Red Hat support. I’ve seen query response times issues (1), memory issues (2) sometimes but I can’t remember about a CPU issue related to JPA internals.

To solve 1 and 2, you need skills but also features that help you tune your code. Like being able to use a specific database syntax (dialects), or  tune associations loading, I could continue on 300 pages here and I would recommend you simply consult the Hibernate documentation’s index.

Tagged with: , , ,