Michael Kopp About the Author

Michael is aTechnical Product Manager at Compuware. Reach him at @mikopp

How Garbage Collection differs in the three big JVMs

Most articles about Garbage Collection ignore the fact that the Sun Hotspot JVM is not the only game in town. In fact whenever you have to work with either IBM WebSphere or Oracle WebLogic you will run on a different runtime. While the concept of Garbage Collection is the same, the implementation is not and neither are the default settings or how to tune it. This often leads to unexpected problems when running the first load tests or in the worst case when going live. So let’s look at the different JVMs, what makes them unique and how to ensure that Garbage Collection is running smooth.

The Garbage Collection ergonomics of the Sun Hotspot JVM

Everybody believes to know how Garbage Collection works in the Sun Hotspot JVM, but lets take a closer look for the purpose of reference.

The memory model of the Sun Hotspot JVM

The memory model of the Sun Hotspot JVM

The Generational Heap

The Hotspot JVM is always using a Generational Heap. Objects are first allocated in the young generation, specifically in the Eden area. Whenever the Eden space is full a young generation garbage collection is triggered. This will copy the few remaining live objects into the empty survivor space. In addition objects that have been copied to Survivor in the previous garbage collection will be checked and the live ones will be copied as well. The result is that objects only exist in one survivor, while eden and the other survivor is empty. This form of Garbage Collection is called copy collection. It is fast as long as nearly all objects have died. In addition allocation is always fast because no fragmentation occurs. Objects that survive a couple of garbage collections are considered old and are promoted into the Tenured/Old space.

Tenured Generation GCs

The Mark and Sweep algorithms used in the Tenured space are different because they do not copy objects. As we have seen in one of my previous posts garbage collection takes longer the more objects are alive. Consequently GC runs in tenured are nearly always expensive which is why we want to avoid them. In order to avoid GCs we need to ensure that objects are only copied from Young to Old when they are permanent and in addition ensure that the tenured does not run full. Therefore generation sizing is the single most important optimization for the GC in the Hotspot JVM. If we cannot prevent objects from being copied to Tenured space once in a while we can use the Concurrent Mark and Sweep algorithm which collects objects concurrent to the application.

Comparision of the different Garbage Collector Strategies

Comparison of the different Garbage Collector Strategies

While that shortens the suspensions it does not prevent them and they will occur more frequently. The Tenured space also suffers from another problem, fragmentation. Fragmentation leads to slower allocation, longer sweep phases and eventually out of memory errors when the holes get too small for big objects.

Java Heap before and after compacting

Java Heap before and after compacting

This is remedied by a compacting phase. The serial and parallel compacting GC perform compaction for every GC run in the Tenured space. Important to note is that, while the parallel GC performs compacting every time, it does not compact the whole Tenured heap but just the area that is worth the effort. Worth the effort means when the heap has reached a certain level of fragmentation. In contrast, the Concurrent Mark and Sweep does not compact at all. Once objects cannot be allocated anymore a serial major GC is triggered. When choosing the concurrent mark and sweep strategy we have to be aware of that side affect.

The second big tuning option is therefore the choice of the right GC strategy. It has big implications for the impact the GC has on the application performance. The last and least known tuning option is around fragmentation and compacting. The Hotspot JVM does not provide a lot of options to tune it, so the only way is to tune the code directly and reduce the number of allocations.

There is another space in the Hotspot JVM that we all came to love over the years, the Permanent Generation. It holds classes and string constants that are part of those classes. While Garbage Collection is executed in the permanent generation, it only happens during a major GC. You might want to read up what a Major GC actually is, as it does not mean a Old Generation GC. Because a major GC does not happen often and mostly nothing happens in the permanent generation, many people think that the Hotspot JVM does not do garbage collection there at all.

Over the years all of us run into many different forms of the OutOfMemory situations in PermGen and you will be happy to hear that Oracle intends to do away with it in the future versions of Hotspot.

Oracle JRockit

Now that we had a look at Hotspot, let us look at the difference in the Oracle JRockit. JRockit is used by Oracle WebLogic Server and Oracle has announced that it will merge it with the Hotspot JVM in the future.

Heap Strategy

The biggest difference is the heap strategy itself. While Oracle JRockit does have a generational heap it also supports a so called continuous heap. In addition the generational heap looks different as well.

Heap of the Oracle JRockit JVM

Heap of the Oracle JRockit JVM

The Young space is called Nursery and it only has two areas. When objects are first allocated they are placed in a so called Keep Area. Objects in the Keep Area are not considered during garbage collection while all other objects still alive are immediately promoted to tenured. That has major implications for the sizing of the Nursery. While you can configure how often objects are copied between the two survivors in the Hotspot JVM,  JRockit promotes objects in the second Young Generation GC.

In addition to this difference JRockit also supports a completely continuous Heap that does not distinguish between young and old objects. In certain situations, like throughput orientated batch jobs, this results in better overall performance. The problem is that this is the default setting on a server JVM and often not the right choice. A typical Web Application is not throughput but response time orientated and you will need to explicitly choose the low pause time garbage collection mode or a generational garbage collection strategy.

Mostly Concurrent Mark and Sweep

If you choose Concurrent Mark and Sweep strategy you should be aware about a couple of differences here as well. The mostly concurrent mark phase is divided into four parts:

  • Initial marking, where the root set of live objects is identified. This is done while the Java threads are paused.
  • Concurrent marking, where the references from the root set are followed in order to find and mark the rest of the live objects in the heap. This is done while the Java threads are running.
  • Precleaning, where changes in the heap during the concurrent mark phase are identified and any additional live objects are found and marked. This is done while the Java threads are running.
  • Final marking, where changes during the precleaning phase are identified and any additional live objects are found and marked. This is done while the Java threads are paused.

The sweeping is also done concurrent to your application, but in contrast to Hotspot in two separate steps. It is first sweeping the first half of the heap. During this phase threads are allowed to allocate objects in the second half. After a short synchronization pause the second half is sweeped. This is followed by another short final synchronization pause. The JRockit algorithm therefore stops more often than the Sun Hotspot JVM, but the remark phase should be shorter. Unlike the Hotspot JVM you can tune the CMS by defining the percentage of free memory that triggers a GC run.

Compacting

The JRockit does compacting for all Tenured Generation GCs, including the Concurrent Mark and Sweep. It does so in an incremental mode for portions of the heap. You can tune this with various options like percentage of heap that should be compacted each time or how many objects are compacted at max. In addition you can turn off compacting completely or force a full one for every GC. This means that compacting is a lot more tunable in the JRockit than in the Hotspot JVM and the optimum depends very much on the application itself and needs to be carefully tested.

Thread Local Allocation

Hotspot does use thread local allocation, but it is hard to find anything in the documentation about it or how to tune it. The JRockit uses this on default. This allows threads to allocate objects without any need for synchronization, which is beneficial for allocation speed. The size of a TLA can be configured and a large TLA can be beneficial for applications where multiple threads allocate a lot of objects. On the other hand a too large TLA can lead to more fragmentation. As a TLA is used exclusively by one thread, the size is naturally limited by the number of threads. Thus both decreasing and increasing the default can be good or bad depending on your applications architecture.

Large and small objects

The JRockit differentiates between large and small objects during allocation. The limit for when an object is considered large depends on the JVM version, the heap size, the garbage collection strategy and the platform used. It is usually somewhere between 2 and 128 KB. Large objects are allocated outside thread local area in in case of a generational heap directly in the old generation. This makes a lot of sense when you start thinking about it. The young generation uses a copy ccollection. At some point copying an object becomes more expensive than traversing it in ever garbage collection.

No permanent Generation

And finally it needs to be noted that the JRockit does not have a permanent generation. All classes and string constants are allocated within the normal heap area. While that makes life easier on the configuration front it means that classes can be garbage collected immediately if not used anymore. In one of my future posts I will illustrate how this can lead to some hard to find performance problems.

The IBM JVM

The IBM JVM shares a lot of characteristics with JRockit: The default heap is a continuous one. Especially in WebSphere installation this is often the initial cause for bad performance. It differentiates between large and small objects with the same implications and uses thread local allocation on default. It also does not have a permanent generation, but while the IBM JVM also supports a generational Heap model it looks more like Sun’s rather than JRockit.

The IBM JVM generational heap

The IBM JVM generational heap

Allocate and Survivor act like Eden and Survivor of the Sun JVM. New objects are allocated in one area and copied to the other on garbage collection. In contrast to JRockit the two areas are switched upon gc. This means that an object is copied multiple times between the two areas before it gets promoted to Tenured. Like JRockit the IBM JVM has more options to tune the compaction phase. You can turn it off or force it to happen for every GC. In contrast to JRockit the default triggers it due to a series of triggers but will then lead to a full compaction. This can be changed to an incremental one via a configuration flag.

Conclusion

We see that while the three JVMs are essentially trying to achieve the same goal, they do so via different strategies. This leads to different behaviour that needs tuning. With Java 7 Oracle will finally declare the G1 (Garbage First) production ready and the G1 is a different beast altogether, so stay tuned.

If you’re interested in hearing me discuss more about WebSphere in a production environment, then check out our upcoming webinar with The Bon-Ton Stores. I’ll be joined by Dan Gerard, VP of Technical & Web Services at Bon-Ton, to discuss the challenges they’ve overcome in operating a complex Websphere production eCommerce site to deliver great web application performance and user experience. Reserve your seat today to hear me go into more detail about Websphere and production eCommerce environments.

Comments

  1. Nice article mate, you indeed covered the topic with great details and highlighted some important point but I think people are aware of GC is a JVM dependent thing and it will vary across JVM and most of the time projects only use Sun’s hot spot JVM but I do see value of knowing some specific like “JRockit does not have a permanent generation”. by the way I do have shared my experience around Garbage collection in Java you may find useful.

  2. Hi Javin,

    Thanks for the nice comment. I have seen your blog before and have read some of your articles.

    In my experience you have to work on IBM WebSphere and WebLogic a lot and tend to forget about the GC differences. We see a lot of problems due to this, which was the motivation for the post.

    Mike

    • Pratap Kumar Nayak says:

      Hi Michael,
      I have two Questions indeed;
      1. What is the actual difference between a Full GC and a Major GC? What actually happens when there is a GC in PermGen ?

      2. When we consider the entire heap, how do we set the size of different size. For Eg: If the heap is of 1 GB then how the size of the Young(Eden+ 2 Survivor Spaces), Tenured and the PermGen size..

      • Hi Pratap,

        There is no difference, full and major GC are about the same thing, actually full gc is a better term than major. See this blog for more info: http://apmblog.compuware.com/2011/03/10/major-gcs-separating-myth-from-reality/

        For your second question there is no easy answer. That has been the pain of heap tuning for a long time. The Young generation needs to be big enough so that transnational memory allocations do not bleed over into old generation. However at too large Young becomes a burden because it is always a stop-the-world gc event.
        Another alternative would be to use the G1 instead which does away with young/old sizing.

  3. The limit for when an object is considered large depends on the JVM version, the heap size

  4. Stephen Reed says:

    I have a batch Java application that creates lots of ephemeral objects yet also has some very large permanent hash tables. I get better performance on the IBM J9 VM than on the Oracle Hotspot JVM especially with regard to long pauses.

    Here are the command line options I use:

    java -Xmx12g -Xms12g -XlockReservation -Xaggressive -Xcompressedrefs -Xgcpolicy:gencon -XlockReservation -Xnoloa -XtlhPrefetch

  5. Krystal Mok says:

    “While the Sun JVM does support thread local allocation, it is not default and not widely used. In fact it is hard to find anything in the documentation about it.”

    While it’s true that there aren’t much well-known documents on the topic of thread-local allocation, HotSpot VM has this switch called UseTLAB, and it’s on by default, on at least x86/x64/SPARC with either the client or the server VM.

    Refer to the follow source code for the default value, or use -XX:+PrintFlagsFinal for confirmation that it’s on:
    hotspot/src/cpu/sparc/vm/c1_globals_sparc.hpp
    hotspot/src/cpu/sparc/vm/c2_globals_sparc.hpp
    hotspot/src/cpu/x86/vm/c1_globals_x86.hpp
    hotspot/src/cpu/x86/vm/c2_globals_x86.hpp

    ===================================

    “While Oracle JRockit does have a generational heap it also supports a so called continuous heap.”

    I wasn’t able to find what this “continuous heap” refers to. This is quite confusing, because “continuous heap” usually means “a heap allocated on a contiguous piece of address space”. By this meaning, though, JRockit has removed the demand of having to use a continuous heap since…well, at least since R27.6 (I googled for document and found http://otndnld.oracle.co.jp/document/products/jrockit/jrdocs/pdf/upgrade.pdf)
    But I guess that’s not what you meant by “continuous heap”. You probably meant a GC heap diveded up into regions, is that right?

  6. ThreadLocal: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

    Ok thanks for the correction. It is off on x86 infrastructure according to documentation, so Sparc seems to have it on.

    JRockit: Continous Heap in that context means a Java Heap that is not devided into generations. JRockIt actually calls it single generational which is confusing in itself. A single generation or continous heap does have young and old objects in separate spaces and thus will suffer if a large number of ephemeral objects are allocated. on the other hand it will perform better if that is not the case as it is less complex.

    I hope that helps. If you have any other questions pls let me know

  7. Krystal Mok says:

    “It is off on x86 infrastructure according to documentation, so Sparc seems to have it on.”

    No, I think that document says UseTLAB is false [on JDK 1.4.2 and earlier, on x86 or with client]. It defaults to true on both 32- and 64-bit, SPARC and x86, client and server, with JDK6′s HotSpot. See this: https://gist.github.com/827140

    =========================

    “JRockIt actually calls it single generational which is confusing in itself.”

    I see, so that’s what you meant.

  8. All right, convinced, I’ll change it ;-)

  9. Guido Schweizer says:

    Hi,
    nice article. I have seen you presenting this on Jax2011. I want to make a little presentation about this topic for my colleagues. Is it possible to get your slides from the jax presentation?

  10. I have sent the presentation to the organizers of the jax, so you should get it eventually, if you need it sooner, give me your email and I’ll send it to you.

  11. Definitely there are JVM’s other than Hotspot.

  12. Dorian Cransac says:

    Regarding the IBM JVM : “It also does not have a permanent generation”

    Are you positive about that?

    Using IBM JDK 1.5 on AIX 5.2, we ran into several native memory segment saturation issues, which sounds a lot like a perm space to me.

    We would monitor that space using the svmon command (“work working storage” segments). The size of that space would depend on the choice made for sizing the heap.

    Usually the root cause would be a massive use of accessors when serializing/deserializing, using reflection, or creating a lot of remote homes/objects through RMI. It seems that certain classes like native accessors would be stored into these native segments and would not be collected except for with a full gc.

    Thing is, there would hardly ever be a need for full gc (not much use of tenured space by the app), so the native segments would get saturated before the full gc would trigger.

  13. Robert Dean says:

    Your IBM links point to WAS 6.0 documentation, which is a bit dated. IBM’s Java 6 2.6 version, which is being delivered with WAS 8.0, has changed the defaults for GC management and offers more options for small, predictable pauses during GC.

    http://publib.boulder.ibm.com/infocenter/wasinfo/beta/topic/com.ibm.java.doc.60_26/IBMJava626.pdf

  14. @Dorian,

    Hi Dorian, The memory model on the AIX is a little different than on x86 platforms. You might need to look at
    http://public.dhe.ibm.com/software/dw/jdk/diagnosis/diag50.pdf and the AIX problem section. Maybe you even use the subpool GC which is specially for AIX, PPC and zSeries.

  15. There’s a typo in your last diagram – “Survior” instead of “Survivor”. Otherwise, thanks for a great article.

  16. Thanks Robert, well for authenticity reasons I will leave the typo in there ;-)

  17. Wavy Curly says:

    Great and nice post. I really like it. Thanks admin.

  18. Dattatreya says:

    The article is good,but it is too brief for JRockit and the IBM JVM.

  19. Dorian Cransac says:

    @Michael.

    Thanks for the link, I will take a look at it.

  20. Ron Carroll says:

    Hi, Guys.. It is a very good article to know about GC. Thanks for it, Michael. I am working on a Java application where everytime it goes out of the memory, something like, the UI will be frozen. Can you guys give me few hints where I can make my application more stable? This is an applet running on IE 5.0. By the way, can you please, let me know about the feasibility and complexity levels of implementing garbage collection for C language using C??? I of course managed to find a couple of profound tutorials on the issue on files search http://byfiles.com if I am not mistaken, but I would also like to get your advice and recommendations. Please, let me know.
    Ron Carroll.

  21. hi Ron,

    If it runs out of memory then the best option is to use a memory profiler to identify the source of the OOM. Typical memory leaks can be seen in http://blog.dynatrace.com/2011/04/20/the-top-java-memory-problems-part-1/.

    As for Garbage Collection in C, I would not do it. A real Garbage Collector is very complex and it took the Java Community several years to get it where it is today.

    a better and easier option is to use auto pointers and smart pointers (http://en.wikipedia.org/wiki/Auto_ptr, http://en.wikipedia.org/wiki/Smart_pointer). Not GC per say, but it gets the job done.

    Hope this helps

  22. Ron Carroll says:

    Thank you, Michael. It has been very helpful indeed.

  23. Great post! I love reading your post and it is really good. Keep up the awesome work..

  24. “n fact whenever you have to work with either IBM WebSphere or Oracle WebLogic you will run on a different runtime. ”

    You can also use the Sun(Oracle) JVM to run WebSphere. You are right that this would be the default, but you should not count on it to be always the case.

  25. The JRockit elevates between small and big items during part. The limit for when an item is considered huge depends upon the JVM version

  26. Can you guys give me few hints where I can make my application more stable. It has been very helpful indeed.
    nexium over the counter

  27. Formerly objects cannot be allocated anymore a series outstanding GC is triggered. When choosing the concurrent pock and run strategy we fuck to be informed of that support move.

  28. holi images says:

    You may wonder how come you will find no cost background so simple, especially if you have no particular requirements. holi images

  29. Capitol Gaming is a trillions of Dollars Business and makes you Millioner in Seconds.

  30. You are right that this would be the default, but you should not count on it to be always the case.

  31. Please do not forget that cummulative GC time stays pretty much the same and depends on the application.

  32. I was searching the code which is used professionally. Then I saw your blog. I want to congratulate you on such a wonderful blog.

  33. gointomexico says:

    Can you provide the sources for your information? I am having difficulty validating your statements.

  34. there is two points that appear above that I agree about them:
    First GC time stays pretty much the same and depends on the application.
    And second GC is a JVM dependent thing and it will vary across JVM.

  35. I forgot to mention at the first comment that your article was not bad!

Comments

*


6 + nine =