Andreas Grabner About the Author

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Pros and Cons of Using Java vs Native Agent for Application Performance Management

Many Application Performance Management (APM) vendors that give insight into the runtime behavior of JVMs use interfaces provided by the Java Runtime. Traditionally Java offered the JVMPI Interface which was replaced by JVMTI with Java 5. Both options allow a tool vendor to load a native library (often called native agent) into the same process as the JVM. This library gets access to JVM state and certain aspects of application execution through a native API. As the library is not running as part of the JVM it is not impacted by JVM stops (e.g: longer Garbage Collection suspensions, runtime errors …) and can therefore continue to deliver data to the external tool.

As an alternative to these native interfaces Java 5 also introduced a pure Java interface. This option loads the agent INTO the JVM and runs as part of the JVM. The “downside” is that this agent gets loaded at a later stage of the JVM Startup Sequence. It is also impacted by JVM stops or Java runtime problems and won’t be able to report certain type of error information.

In this blog we will highlight some of the reasons why our engineering team decided on the native approach in combination with bytecode instrumentation (BCI) vs. moving to the Java based approach:

Full Insight of all Classes

The agent gets loaded before any classes are loaded. This allows the agent to collect data from the very beginning and is able to collect data and control execution of ALL Java code. No restrictions apply here. In order to capture method level information, leveraging bytecode instrumentation (BCI) provides an additional benefit instead of relying on native callbacks. Being able to perform BCI on any class allows additional insight into core system classes (java.lang.Object, java.lang.Thread, …)

More detailed information

From native code we can get much more detailed performance information, like hardware high-resolution timers, detailed GC information, et cetera. Using a native approach therefore eliminates the need to install yet another native agent that collects system information. The Java agent most likely doesn’t get access to this data as it runs within the special security context of the JVM.

Native Agent gives us detailed and very accurate information on Memory, Threads, CPU and GC Impact through native APIs

Running native also allows us to capture host performance metrics such as network, disk, cpu, memory and information about other processes running on that system

Additional information

Within the native agent we can collect much more information about the JVM, like memory & thread dumps, crashes of the JVM, et cetera. Especially for thread and memory analysis it is beneficial to get access to both JVM Threads and Memory usage as well as native Threads and Memory usage. In case of a crash caused by an Out-of-Memory Error the native agent is still able to capture the data on the heap as the native process is still running and able to access memory information.

Native Agents allows us to get insight into every thread, its state and even owned monitors and how that impacts other threads

Native Agents allow us to capture full memory dumps before the Java process eventually crashes getting all information needed for post mortem analysis

Less impact on JVM

Pure Java agents run “within” the JVM they’re observing. They are therefore adding overhead to the JVM itself and may impact execution of the application. One prominent example is higher Java heap usage as compared to the native counterpart.

Performance

In native code data needed for analysis can just be fetched in a more efficient way, e.g. stack traces. Most of this information is available through native APIs. Calling this from a Java Agent from within the JVM would require a more expensive call from Java to Native.

Low Overhead allows our native agent to capture every single request with details down to method executions, exceptions including stack traces, SQL statements …

Not attached to JVM

As the agent is not attached to the JVM, it’s also not affected by JVM stops (especially GC related) and able to continue data collection during such JVM stops. This helps us to collect detailed and accurate information about the actual impact of Garbage Collection Suspensions on the currently executing application threads. A Java Agent won’t get that accurate information as it is also impacted by these “stops” of the JVM.

Native agents allow us to get the actual impact of a Runtime Suspension (e.g: GC) on the end user response time

What are your thoughts?

These are our thoughts and reasoning behind why we decided to use the native option. One counter argument we sometimes hear is that bugs in the native agent may crash the process and with that the JVM. This is of course a valid concern but that’s why we are running lots of automated tests on all platforms we support to ensure we don’t cause any problems.

Now the shout-out to all other developers that use these interfaces: what are your thoughts on this? Why did you decide on the Java or Native approach?

 

Special Thanks to Christian Schwarzbauer, Chief Software Architect @ Compuware APM/dynaTrace, for the input on this blog.

Comments

  1. does it mean only native agent can capture new thread invocation from an existing thread? like something we can see about thread.run() from purepath.

    • Our agent that captures PurePath’s is able to follow transactions across thread boundaries – and it works in both cases: the creation of a new thread as well as when leveraging an existing thread in a thread pool.

  2. Ben Cotton says:

    Thanks for pointing this out. It is a conflict of interest for any .jar based agent making quality of health statements on the same JVM at which the .jar itself is dependent. Kind of like how can a brain-damaged brain make credible qoh statements wrt to the *exact same* brain? Providing a .so based agent resolves this conflict of interest nicely.

Comments

*


+ nine = 12