Alois Reitbauer About the Author

Understanding Caching in Hibernate – Part Three : The Second Level Cache

In the last posts I already covered the session cache as well as the query cache. In this post I will focus on the second-level cache. The Hibernate Documentation provides a good entry point reading on the second-level cache.

The key characteristic of the second-level cache is that is is used across sessions, which also differentiates it from the session cache, which only – as the name says – has session scope. Hibernate provides a flexible concept to exchange cache providers for the second-level cache. By default Ehcache is used as caching provider. However more sophisticated caching implementation can be used like the distributed JBoss Cache or Oracle Coherence.

First we have to modify our code sample so that we now load the Person object in two sessions. The source code then looks as follows

public void loadInTwoSessions (){
  // loading in first session
   Session session = getSessionFactory().openSession();
   Transaction tx = session.beginTransaction();
   Person p = (Person) session.load(Person.class, 1L);
   System.out.println(p.getFirstName());
   tx.commit();
   session.close();
   // loading in second session
   session = getSessionFactory().openSession();
   tx = session.beginTransaction();
   p = (Person) session.load(Person.class, 1L);   
   System.out.println(p.getFirstName());       
   tx.commit();
   session.close();       
}

As we have not activated the second level cache, we expect the SQL queries to be executed twice. Looking at the PurePath of this transactions verifies our asumption.

Loading a person object in two sessions without second-level cache

Loading a person object in two sessions without second-level cache

Now we activate the second-level cache. Activating the second level cache requires us change to Hibernate configuration file and enable second-level caching by adding and additionally specify the cache provider as shown below.


<property name="hibernate.cache.use_second_level_cache">true</property>
<property name="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</property>

In this example I am using Ehcache for demonstration purposes. In order to enable caching of our Person objects if have to specify the caching configuration in the ehcache.xml file.  The actual cache configuration depends on the caching provider. For Ehcache the configuartion is defined as follows. The configuration for the Person class used in the example is boiler-plate Ehcache configuration. It can be adopted to specific needs. Describing all possible configurations options like using mulitple cache regions etc. is beyond scope of this post.

<cache name="com.dynatrace.samples.database.Person"
   maxElementsInMemory="300"
   eternal="true"
   overflowToDisk="false"
   timeToIdleSeconds="12000"
   timeToLiveSeconds="12000"
   diskPersistent="false"
   diskExpiryThreadIntervalSeconds="120"
   memoryStoreEvictionPolicy="LRU"       
/>

Finally we have to configure caching also at Hibernate level. Hibernate supports mulitple settings for caching. As we are only reading data it the moment a read-only cache is sufficient for our purposes. Hibernate for sure supports  read-write cache as well and also transactional caches in case this is supported by the cache provider.  The following liine in the hibernate configuration enable read-only caching for Person objects. Alternatively also Hibernate associations could be used.

<cache usage=”read-only” />

Now we expect the object to be retrieved from the second-level the second time it is loaded. A PurePath trace verifies this assumption.  Now, only the first time a database call gets executed.

Loading in two sessions with enabled second-level cache

Loading a person object in two sessions with enabled second-level cache

Read-Write Caching

After having looked at plain read caching we look in the next step at read-write caching.  Our code example gets a bit more complex. We again use two sessions. We load the object in the first session, update it thenload it in the second session. Both sessions are created upfront and are kept open until the end.

public void complexLoad (){   
  Session session1 = getSessionFactory().openSession();
  Session session2 = getSessionFactory().openSession();        

  Transaction tx1 = session1.beginTransaction();
  Person p1 = (Person) session1.load(Person.class, 1L);
  System.out.println (p1.getFirstName());
  p1.setFirstName ("" + System.currentTimeMillis());
  tx1.commit();

  Transaction tx2 = session2.beginTransaction();
  Person p2 = (Person)session2.load(Person.class, 1L);
  System.out.println (p2.getFirstName());
  tx2.commit();

  session1.close();
  session2.close();
}

We expect the object to be retrieved from the cache when it is loaded in the second session.  Looking at the PurePath of this transaction however shows something different. This method executes three SQL statements. First a SELECT to load the Person object, then an UPDATE to update the record in the database and then again a SELECT to load the Person object for the second session.

Two transaction loading a person object both times leading to a database query.

Two transaction loading a person object both times leading to a database query.

This is not what we necessarily where expecting. The object could have been retrieved from the cache in the second session. However it got loaded from the database. So why wasn’t the object taken from the cache. A closer look at the internal Hibernate behaviour unveils this secret.

Details on loading from second-level cache

Details on loading from second-level cache

The PurePath snippet above shows the details for loading the Person object in the second session. The key is the isGettable method, which in this case returns false. The input to isGettable is the session creation timestamp, as indicated by the arrow.  A look at the sourcecode unveils what is checked within this method.

public boolean isGettable(long txTimestamp) {
 return freshTimestamp < txTimestamp;
}

The method verifies wheter the session’s timestamp (txTimestamp) is greated than the freshTimestamp of the cached object. In our case the second session was created BEFORE the object was updated. Consequently this method will return false. If we modify our code as follows the object will be loaded from the second-level cache.

public void complexLoad (){   
  Session session1 = getSessionFactory().openSession();
  Transaction tx1 = session1.beginTransaction();
  Person p1 = (Person) session1.load(Person.class, 1L);
  System.out.println (p1.getFirstName());
  p1.setFirstName ("" + System.currentTimeMillis());
  tx1.commit();

  Session session2 = getSessionFactory().openSession();  
  Transaction tx2 = session2.beginTransaction();
  Person p2 = (Person)session2.load(Person.class, 1L);
  System.out.println (p2.getFirstName());
  tx2.commit();

  session1.close();
  session2.close();
}

The PurePath snipped below verfies this assumption and shows that this time isGettable returns true and the object is retrieved from the cache.

Loading the Person object from second-level cache directly

Loading the Person object from second-level cache directly

Interaction of Session and Second-Level Cache

Finally I want to take a short look of the interaction between the session and the second-level cache. The important point to understand is that as soon as we use the second-level cache we have two caches in place. Caches are always a source of inconsistent information, which we take as the price for better performance and scalability. In order to avoid problems and unwanted behaviour we have to understand their internal behaviour.  Hibernate always tries to first retrieve objects from the session and if this fails tries to retrieve them from the second-level cache. If this fails again objects are directly loaded from the database.  The PurePath snippet below shows this loading behavior.

Load hierarchy in Hibernate showing logical flow of object retrieval

Load hierarchy in Hibernate showing logical flow of object retrieval

Conclusion

The second level cache is powerful mechanism for improving performance and scalability of your database driven application. Read-only caches are easy to handle, while read-write caches are more subtile in their behavior. Especially the interaction with the Hibernate session can lead to unwanted behavior. Sessions should therefore be used as what they are designed for – a transactional context. There are more details on the second-level cache I did not elaborate on like synchronization or replication behavior. However the combination of the three caching articles should provide good insight into Hibernate caching behavior.

If you’re interested in getting a better idea of how dynaTrace provides this transactional context (PurePath), you can check out this 14 minute video I contributed to which gets into some of the details of the product.

Comments

  1. On TSS an interesting comment was posted, wonder what you think of it:

    “Hibernate’s second level cache is completely worthless.

    A database is much more optimized for caching. Especially in combination with SSD disks for the DB, the cache is more than completely worthless (utterly worthless?). Hibernate caching is just an unnecessary extra layer that introduces extra complexity in the system. Such extra layers should be removed as much as possible.

    In a way, the cache is even semantical incorrect. Namely, when I change data in the DB and then retrieve this data via the entity manager, it returns the old data. I went out of my way to make sure all Hibernate caching that I added before was completely removed from my code.”

  2. Alois Reitbauer Alois Reitbauer says:

    Hey Peter,

    I do not think caching is worthless. It helps to avoid network roundtrips to the database which can drastically improve scalability.

    However how it is implemented in Hibernate I see it only useful when you use read-only data. For read and write transactions the implementation I used makes no sense as the second query simply is not necessary. A more sophisticated caching solution can definitely provide better value.

    The problem of stale data is one you have with every cache. It is a cache so data might not be up to date. However in case you only have one application modifying the data you can perfectly sync the reads and writes.

  3. I grabbed from this from the hibernate forum since I am facing the same issue about 2nd level cache. I was wondering you can answer this!
    =========
    I have enabled second level cache for an entity. If I use entityManager.find to retrieve the entity by PK, the cache works (i.e. it hits the DB only the first time).

    But if I make a HQL query on this entity, it hits the DB every time…

    Here is the entity:

    @Entity
    @Table(name = “CORE_MessageBuyerTranslation”, uniqueConstraints = {})
    @Cache(usage=CacheConcurrencyStrategy.NONSTRICT_READ_WRITE, region=”com.ibx.ibxrequest.model.CoreMessageBuyerTranslation”)
    public class CoreMessageBuyerTranslation implements java.io.Serializable {

    }

    and here is the HQL that hits the DB every time:

    List translatedKeys = entityManager.createQuery(
    “from CoreMessageBuyerTranslation mbt ”
    + “WHERE mbt.coreBuyer = :coreBuyer ”
    + ” AND mbt.coreLang = :coreLang ”
    + ” AND mbt.coreMessage.messageKey = :messageKey”)
    .setParameter(“coreBuyer”, buyer)
    .setParameter(“coreLang”, lang)
    .setParameter(“messageKey”, key)
    .getResultList();

    Why does it hit the DB every time? Shouldn’t it use the cache?

  4. Dino,

    the reason why it request the data from the database every time is that you either did not enable the query cache and/or did not specify the query to be cachable.

    Hibernate requires the Entity key to retrieve an object from either the session or the second-level cache. See my previous post on this at http://blog.dynatrace.com/2009/02/16/understanding-caching-in-hibernate-part-two-the-query-cache/

  5. John Reilly says:

    Hi,
    You said above – “By default Ehcache is used as caching provider. However more sophisticated caching implementation can be used like the distributed JBoss Cache or Oracle Coherence.” People should know that ehcache has been doing distributed caching for a while. The RMI impl is quite solid and I’ve been using it for a long time (with multicast cache peer discovery). There is also a new JGroups impl which I haven’t tried yet.

    To answer Peter, as a heavy user of the hibernate cache (and other caching) I have to say that the commenter on TSS is just plain wrong. The DB query cache is only useful if the db data is not changing very much (or at all) and/or you have very little load on the DB.

    In response to Dino (and Alois’ answer to Dino), you need to be careful with the query cache and how you expire entries in it if you care about data freshness. This also goes for the cached collections from OneToMany associations. See SessionFactory.evict* methods.

    Hibernate caching is just one of a number of ways you can cache and can obviously be used in combination with other caches, e.g. ehcache directly, memcache, etc.

    The less I care about data freshness, the more I tend to push outward to other caches from hibernate caches. Remember that hibernate caches do not cache the entity objects themselves, just data. If you are hitting the hibernate caches a lot, there is an object creation overhead that you do not have if you are fetching objects direct from another cache. I find that a mix works well.

    Regards,
    John

  6. Paolo Denti says:

    Peter,
    io posted my reply on TSS but, basically, i think that the commenter on TSS simply does not have a clue of what an application level cache is.
    In my opinion, this guy in his life simply wrote crud-like applications where the application data is exactly the dmbs data; just in that trivial case, with small amount of data, the second level case is basically useless.

    Moreover, one personal additional consideration, is related to the hardware changes of the last year.
    In the most of the catalog based web application (a shop for example, news websites, …) the full catalog can be easily FULLY loaded in cache with no cache expiration date without having any significant impact on the ram usage; a typical catalog full loaded in second level cache could waste some megs, in a typical server config of some gigs.
    The evict of the 2nd level cache can be executed by every crud operation by the backoffice modifying the catalog (which happens seldom).

  7. Taylor Gautier says:

    Did you provide source code for these examples? That would be a nice addition if you haven’t. If you did, would you mind posting a link or pointing me to where it is??

    Thanks and great job explaining the different caching strategies for Hibernate.

  8. Dominik Zmitrowicz says:

    Hi,
    I have a weird problem with 2nd lvl cache.
    I configured my entity to use caching:
    @Cache(region = “Kierunek”, usage = CacheConcurrencyStrategy.READ_WRITE)
    public class Kierunek

    but when i deploy i get a following error:

    19:52:57,312 ERROR [STDERR] 2009-04-19 19:52:57 net.sf.ehcache.hibernate.EhCacheProvider buildCache
    WARNING: Could not find a specific ehcache configuration for cache named [persistence.unit:unitName=pz2EAR.ear/pz2EJB.jar#pz2EJB.Kierunek]; using defaults.
    19:52:57,313 ERROR [AbstractKernelController] Error installing to Start: name=persistence.unit:unitName=pz2EAR.ear/pz2EJB.jar#pz2EJB state=Create
    java.lang.IllegalArgumentException: Cache name cannot contain ‘/’ characters.

    part of persistence.xml:

    and part of ehcache.xml

    any hints?

  9. Dominik Zmitrowicz says:

    ok found solution
    just need to add following property
    <property name=”hibernate.cache.region_prefix” value=”"/>

  10. Terracotta has a second-level cache solution for Hibernate as well.

  11. Great Article about the cache. It certainly cleared my understanding.

  12. Alois Reitbauer Alois Reitbauer says:

    I will post the source code shortly. Just have to massage it a bit to make it compile for you guys :-)

  13. Thanks for the article and especially for your answer to Dino’s question. I had the same problem.

  14. Hey Peter, can you tell me What you use monitoring hibernate running tools

  15. I would agree that the comment about caching strategies posted on TSS is completely off based (and more specifically just plain wrong). In particular I read a whitepaper several years ago about the four different classifications of data (transaction, reference, static, last one can’t remember) but if you look at these classifications of data it is fairly obvious that a good caching scheme can provide great benefits to the application layer in any n-tiered application infrastructure. Typically you want to avoid as much processing, network latency and overall response time in developing a highly performant application, avoiding work where ever possible being the goal. A well implemented caching strategy even just for reference and static data can provide significant benefit to the overall responsiveness of an application, particularly as you push the cache up the application layers closer and closer to the UI / Presentation layer (why build UI objects for every request – web app of course – when the data almost never changes).

    Having said that, one item that you kind of glossed over in the discussion of the Second level caches when referring to sessions is the concept of a web session and whether how this “session” corresponds to the traditional concept of a web “session”. It was not immediately transparent to the novice reader whether or not these session coincided and if an object (static object) read from one session would be available as a cacheable entity in other web “sessions”. We have found that caching of the static data and a nice pre-loading of that static data for all web sessions is a nice performance boost that is easily understood and lacking any of the caching issues read-write caching since the data does not change.

    Thanks for the articles.

  16. Great Information! Thanks for sharing!

  17. Yeah Peter, it is not fair to say that second level caching is useless. Nhibernate’s second level cache is a great way to improve performance and scalability in my opinion. I have used a free distributed cache called NCache Express with Nhibernate with great success. You can see for yourself: http://www.alachisoft.com/ncache/ncache_express.html

  18. i agreed that “Hibernate’s second level cache is completely worthless” seems to be a bit harsh. There is no doubt that cache is the ultimate way to overcome problems like performance, reliability and scalability. it is dream of each and every developer to have such an app which is free from above mentioned hurdles. i’ll also second Mark about NCache. I’ve used NCache once and I believe it is a reliable source.

  19. Debadatta Sahoo says:

    this information is abosoulte.I have lack of words to appreciate you for your genuine effort to make the concept so easy.

    Thanks and Regards,
    Debadatta

  20. Thanks Bro.. God Bless You

  21. Munikumar says:

    Thanks for explaining the concept in a easier manner with examples.

  22. I’ve got one application that both reads, insert, deletes entities from a database. I especially want to improve performance of fetching data (queries). Is it enough and reliable to create a read-only 2nd level hibernate cache, or must it be a read-write cache ?

  23. I’ve got two application using the same database. I don’t have a clustered cache, so each application has it’s own Hibernate cache. One of the applications only reads data, the other reads and writes data.
    If I configure a read-only cache in the first application I guess it will always retrieve stale data because it will never get updated ? I guess the only way to get it updated is to either use a clustered cache, or set the time-to-live of cache entries to a certain interval ?

  24. k

  25. Hi,

    We are facing a problem with memcache. We are using hibernate-memcached-version1.5-SNAPSHOT and spymemcached-provider-version3.0.2.

    The following are the configuration

    persistence.xml
    —————

    !– Enable 2nd Level Cache –

    property name=”hibernate.cache.use_second_level_cache” value=”true”
    property name=”hibernate.cache.use_query_cache” value=”true”

    !– MemCache Configuration –

    property name=”hibernate.cache.region.factory_class” value=”com.googlecode.hibernate.memcached.MemcachedRegionFactory”
    property name=”hibernate.cache.use_minimal_puts” value=”true”
    property name=”hibernate.cache.use_structured_entries” value=”true”
    property name=”hibernate.memcached.servers” value=”${hibernate.memcached.server.host.port}”
    property name=”hibernate.memcached.cacheTimeSeconds” value=”1800″

    dto
    —-

    @Cacheable(true)
    @org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)

    In GenericDao we are setting query hints cacheable to true for loadAll().

    We are using loadAll() method to fetch all the records.
    Whenever we made a request the loadAll query is executing and also the queries based on id are executing.

    when i refer to the log am able to notice that the data is fetching from database and setting in memcache for a request and when we make another request instead of fetching the data from the memcache it is hitting again to the db and again setting to memcache. Am unable to understand that without modifying any data why it is hitting to db?

    Please let me know we are missing anything.

Comments

*


one + 2 =