Hibernate Batch Processing – Why you may not be using it. (Even if you think you are)

Hibernate batch processing is powerful but it has many pitfalls that developers must be aware of in order to use it properly and efficiently. Most people who use batch probably find out about it by trying to perform a large operation and finding out the hard way why batching is needed. They run out of memory. Once this is resolved they assume that batching is working properly. The problem is that even if you are flushing your first level cache, you may not be batching your SQL statements.

Hibernate flushes by default for the following reasons:

  • Before some queries
  • When commit() is executed
  • When session.flush() is executed

The thing to note here is that until the session is flushed, every persistent object is placed into the first level cache (your JVM's memory). So if you are iterating over a million objects you will have at least a million objects in memory.

To avoid this problem you need to call the flush() and then clear() method on the session at regular intervals. Hibernate documentation recommends that you flush every n records where n is equal to the hibernate.jdbc.batch_size parameter. A Hibernate Batch example shows a trivial batch process. Lets look at a slightly more complicated example:


public void doBatch() {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    Cart cart = new Cart(...);
    customer.setCart(cart) // note we are adding the cart to the customer, so this object
     // needs to be persisted as well
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
}

Assuming the Customer cascades save to the Cart object you would expect to see something like this in your SQL logs:


insert into Customer values (...)
insert into Cart values(...)

There are two reasons for batching your hibernate database interactions. The first is to maintain a reasonable first level cache size so that you do not run out memory. The second is that you want to batch the inserts and updates so that they are executed efficently by the database. The example above will accomplish the first goal but not the second.

The problem is Hibernate looks at each SQL statement and checks to see if it is the same statement as the previously executed statement. If they are and if it hasn't reached the batch_size it will batch those two statements together using JDBC2 batch. However, if your statements look like the example above, hibernate will see alternating insert statements and will flush an individual insert statement for each record processed. So 1 million new customers would equal a total of 2 million insert statements in this case. This is extremely bad for performance.

The solution is very simple. Just add the following two lines to your hibernate configuration.


<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.order_updates">true</prop>        

These two parameters tell Hibernate to sort the insert and update statements before trying to batch them up. So if you have 20 inserts for a Customer object and 20 inserts for a Cart, those will be sorted and each call to flush will result in two JDBC2 batch executions of 20 statements each.

16 Responses to “Hibernate Batch Processing – Why you may not be using it. (Even if you think you are)”

  1. Trey Says:

    This is very well-written article, and I agree that Hibernate should behave this way. However, I wrote a simple application that performed a save on 10 records of type TobUser. The logs indicate that 10 separate INSERT statements were executed. Each block looks like:
    DEBUG AbstractEntityPersister:2154 – Inserting entity: [org.yi.spr1ngb0k.tob.model.TobUser#402881e41d2c7cf8011d2c7d002a0001]
    DEBUG AbstractEntityPersister:2156 – Version: 1
    DEBUG AbstractBatcher:358 – about to open PreparedStatement (open PreparedStatements: 0, globally: 0)
    DEBUG SQL:393 – insert into TobUser (lockVersion, firstName, lastName, name, password, id) values (?, ?, ?, ?, ?, ?)
    DEBUG AbstractBatcher:476 – preparing statement
    DEBUG AbstractEntityPersister:1942 – Dehydrating entity: [org.yi.spr1ngb0k.tob.model.TobUser#402881e41d2c7cf8011d2c7d002a0001]
    DEBUG IntegerType:80 – binding ‘1′ to parameter: 1
    DEBUG StringType:80 – binding ‘1′ to parameter: 4
    DEBUG StringType:73 – binding null to parameter: 5
    DEBUG StringType:80 – binding ‘402881e41d2c7cf8011d2c7d002a0001′ to parameter: 6

    and 2nd-9th look the same except for ‘re-using PreparedStatement’. So my question is: What am I doing wrong?

    Any help would be appreciated.
    -Trey

  2. abramsm Says:

    Hello – Thanks for the comment. Are you sure it is actually executing those inserts individually? Try turning on logging in the BatchingBatcher class to see if it is batching up those inserts and sending them to the DB in a group. If you provide your hibernate configuration and the complete log file I may be able to help more.

    Regards,
    Matt

  3. Theo Briscoe Says:

    Great Explanation, thanks

  4. paliwalashish Says:

    Great article! I just did a simple benchmark on my application without bulk and with bulk. Benchmarked it for 10000 inserts. Normal inserts tool 368 sec and batch updates took 6 sec. Batch size = 500. Fews, Playing with a system which has to handle few good many records per sec. The only problem that I now have is high CPU usage while inserting. Batch update has reduced it to spikes while updating.

    Thanks! and appreciate any valuable suggestions you may have.

  5. abramsm Says:

    Hello. I would suggest reducing your batch size to 15-25 rather than 500. Those spikes you see are most likely caused by hibernate iterating over the objects in the first-level cache.

    Hope this helps and thanks for checking out my blog.

    Matt

  6. JeffR Says:

    Great (and timely) post, but I still don’t seem to be batching yet. The logging contains a lot of “Executing batch size: 1″ and the inserts are out of order. Do you know of a trick to see at runtime if the property definitely got set?

  7. abramsm Says:

    Hi Jeff –

    I would turn on debug logging for Hibernate. When it first starts up it will display the parameters you have configured Hibernate with. In addition a breakpoint in the org.hibernate.jdbc.BatchingBatcher to see if its picking up the settings you expect.

    If you post your logfile I’d be happy to take a look.

    Matt

  8. JeffR Says:

    I do see a line in the logging like “DEBUG org.hibernate.impl.SessionFactoryImpl – instantiating session factory with properties: { … )” and both hibernate.jdbc.batch_size=50 and hibernate.order_inserts are in the list. Building Hibernate from source was the next thing I was going to do anyway. This happens to be to a firebird database over a jaybird jdbc driver.

  9. abramsm Says:

    It looks like you may not be setting the order_inserts parameter

    # true

    Are you inserting more than one entity? If so every time hibernate sees a different insert it ends the batch. So if you order them you will use your batch size of 50.

    Matt

  10. JeffR Says:

    I think order_inserts is set based on the logging.

    The code is looping over some objects and calling saveOrUpdate() on each. There are all new (this is an import function). There are associations to other objects, similar to your example. So for each object I see an insert for the main table, then one for each of the associated tables, and that pattern over and over again. Every once in a while I see logging saying “Executing batch size: 3″ or 4 but mostly 1.

    You wouldn’t be able to point me to an article on building hibernate in Eclipse would you? Is that how you’re setting a breakpoint? I did get it built from the command line so now I can add some logging at least.

  11. abramsm Says:

    Jeff -

    Are you doing this from within a transaction? If you are not then it will commit each operation every time you call save or update.

    You do not need to compile hibernate in eclipse to set a break point. You just need to link the source for hibernate to the jar you are depending on.

    Matt

  12. JeffR Says:

    I’m pretty sure it’s one transaction – we’re using Spring and our transaction is out the service method that tracks to this DAO.save().

    Now that I’m using a home-built 3.2.6.ga instead of the patched 3.2.2 our project has been using I am seeing sorting on SOME inserts and more cases where it says “Executing batch size: 50″. Not a huge performance jump, though, and boy do we need one :-(

  13. JeffR Says:

    Just FYI, I found out that the Jaybird JDBC driver for Firebird totally defeats any insert batching I might do, which is pretty much what I was seeing. So far the biggest boost for me was using org.hibernate.id.enhanced.SequenceStyleGenerator with an increment_size of about 1000. Cuts down lots of round trips to get ids.

  14. Anonymous Says:

    Thanks for the update Jeff. I thought of asking if Jaybird supported batch inserts but I took a look at the website and it said that it had support JDBC 2.0 batching. Guess they didn’t include batch inserts…

    Thats a good idea to cut the round trips to the DB to get the IDs. Another option is to generate the IDs in the VM, assuming that’s an option in your application.

    Matt

  15. JITIN Says:

    Hi

    This is intresting, I have order inserts/updates as true.
    If I have object A extends B and B extends C.
    When I try to save list of object A – then B & C also get saved (ofcourse) but the batching is not done by hibernate. So A B C are saved sequentially instead of batch A, then batch B and then batch C.
    Any clue ?

  16. Andrew Rubalcaba Says:

    awesome article.

    I have some questions that are posted in this thread that you might have some insight into. I’ve made the changes but instead of increase in time I get a decrease in time.

    Ok so I just ran a simple test with a simple DAO where I inserted 10,000 records.
    First test took 37 seconds.

    I then modified the hibernate configuration file by adding the following:
    30
    true
    true

    and add the following code to my DAO

    if(i %20 == 0)
    {
    session.flush();
    session.clear();
    }
    and the ran inserted another 10,000 records and that test took 51 seconds.

    I do see things in the logs like

    - Executing batch size: 20
    - success of batch update unknown: 0
    - success of batch update unknown: 1
    - success of batch update unknown: 2
    - success of batch update unknown: 3
    - success of batch update unknown: 4
    - success of batch update unknown: 5
    - success of batch update unknown: 6
    - success of batch update unknown: 7
    - success of batch update unknown: 8
    - success of batch update unknown: 9
    - success of batch update unknown: 10
    - success of batch update unknown: 11
    - success of batch update unknown: 12
    - success of batch update unknown: 13
    - success of batch update unknown: 14
    - success of batch update unknown: 15
    - success of batch update unknown: 16
    - success of batch update unknown: 17
    - success of batch update unknown: 18
    - success of batch update unknown: 19

    Also what exactly do the following entries in the logs mean?

    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020635
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020636
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020637
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020638
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020639
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020640
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020641
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020642
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020643
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020644
    - Invalidating space [NMRW.FOO_TXN], timestamp: 5098157369020645

Leave a Reply