Why doesn’t hibernate automatically update changed objects?

February 7, 2009

Have you ever asked yourself this question? Have you ever been surprised that the changes made to a persistent object are not committed to the database? Well this is a common problem and can be caused by many things. One nasty cause is if you are using a common hibernate batch processing pattern where session.flush and session.clear are used to manage memory and improve performance. Do you see the problem with the following code snippet?

public void doBatch() {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
List personList = session.createQuery("from Person").list();
int i = 0;
for (Person person : personList) {
    person.setName("newName"); // this change should be caught by hibernate and cause the an update statement to be generated
    if ( ++i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
}

You might expect that every Person object that didn’t already have “newName” for the name value would be updated in the database with that new value. That is the correct assumption for the first 20 Person objects. However #21 on will not be updated in the database. This is because when hibernate has persistent objects in the first level cache it compares any changes to the field values of the object to the values in the cache. When a field changes hibernate schedules an update for the object in the database. Calling session.clear() clears the first level cache which is great for saving memory but doesn’t help with auto detecting changes to persistent objects. There are two easy solutions to this problem:

public void doBatch() {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
List personList = session.createQuery("from Person").list();
int i = 0;
for (Person person : personList) {
    String newValue = "newName";
    if (!newValue.equals(person.getName()) {
        session.update(person);  // need to manually call update because object may not still be in first level cache
    }
    person.setName(newValue); 
    if ( ++i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
}

Notice the manual check for a value change and the call to session.update. The manual check prevents updating objects where no values have changed.

Another solution is to use a scrollable resultset. When a scrollable result set is used the persistent object is not loaded into the cache until resultset.next() is called. This way you do not have to worry about the object being cleared from the cache before you make changes to it.


Don’t Let Your Non Thread-Safe Variables Escape!

September 16, 2007

I was asked to help debug a problem in a project that was about to be deployed. It was one of those ‘fun’ transient problems that only happens when you can least afford it. Like when the customer is testing the product.

The project consists of a JSP/AJAX web tier that connects to the DB via a JPA EntityManager. The problem we had is that every now and then the user would see a nasty assertion error for an uncaught exception displayed in their browser. (Why the exception was not handled nicely is a another subject worth writing about).

The code uses a custom static class to provide access to the EnityManager (Using a DI framework instead would be nice). It has some actions that span multiple calls so the EntityManager for each Thread was stored in a ThreadLocal variable. This should ensure that each Thread should get its’ own EntityManager and hence its’ own transaction. I noticed a few synchronization problems when I first looked at the class. For example:

public static EntityManager getEntityManager() {
  EntityManager entityManager = 
     managerThreadLocal.get();
  if (entityManager == null 
                          || !entityManager.isOpen()) {
    if (entityManagerFactory == null) {
      entityManagerFactory = 
        Persistence.createEntityManagerFactory("myDS");
    }
    entityManager = 
      entityManagerFactory.createEntityManager(); 
    managerThreadLocal.set(entityManager);
  }
  return entityManager;
}

So to fix this we made the entityManagerFactory a private static final member variable of the class. Since the enityManager itself is ThreadLocal we shouldn’t need to synchronize access to that variable since only one Thread should be ever accessing its’ own EntityManager at any one time.

Of course this did not fix our main problem. The real problem turned out to be that the method above breaks Thread confinement. Sound the alarms! We have an escaping variable! Exposing a public method that returns a ThreadLocal variable is dangerous because we have no control over what the other classes in the system will do with that variable once it is returned. In this project the problem turned out to be that one class adopted this ThreadLocal variable and turned it into a member variable. So sometimes it worked, and others it didn’t.

The best way to resolve this is to not let the enityManager escape. Encapsulate all access to this variable inside of a class that is responsible for ensuring thread safe access to the non thread safe variable. This solution is cumbersome and adds a layer of abstraction that isn’t core to the business logic but is certainly much better than transient assertion errors visible to the customer. Switching to a declarative transaction management implementation such as this one, Transaction Management Using Spring is a better solution in the long run.