Tracing problems in Project Stonehenge and other heterogeneous systems
Transactional Tracing with dynaTrace on Project Stonehenge
Stonehenge is a reference implementation of a Stocktrader application. The application has the following tiers:
- Frontend (Implemented with ASP.NET and PHP)
- Business Services (Implemented with ASP.NET, Java and PHP)
- Order Processing Services (Implemented with ASP.NET, Java and PHP)
The Frontend component talks with the Business Service to query the current status of the user’s portfolio and get stock quotes. When selling or buying new stocks, the Business Service talks with the Order Processing Service which asynchronously handles the request. Both Business and Order Processing are implemented in .NET and Java (I will leave PHP out here but it is also possible to talk with a PHP implementation). Via configuration files it is possible to configure which version of the implementation (Java or .NET) to be used.
Step 1: Define a System Profile
The first step with dynaTrace is to define a System Profile describing the system to analyze. I end up with 4 different Agent Groups – which represent the logical grouping of the services provided by the Stonehenge project:
- Frontend (hosted in IIS)
- Business Service for .NET (hosted in standalone application)
- Order Processing for .NET (hosted in standalone application)
- Stock Trader Services for Java (hosted in WSO2 Web Application Server)
I only create one configuration for the Java implementation because the default deployment of Stonehenge hosts both services in on single WSO2 Web Application Server Instance.
Step 2: Executing transactions with different tier-to-tier communication configurations
I am starting Stonehenge with different interoperability configuration options, e.g.: letting the Business Services for .NET talk to Order Processing for .NET or letting it talk to the Order Processing implemented in Java. The individual transactions are picked up by dynaTrace and I can then visualize each individual PurePath (which is the representation of a single distributed transaction) in different ways.
Visualizing the transaction flow across dynaTrace Agents:
Step 3: Analyzing service interactions
A major aspect of distributed and service oriented applications is to look at the actual interactions between services or components. Calling remote services has become an easy task with the great support of frameworks and IDE’s. This “convenience” also brings problems with it – problems that are not visible unless you analyze what is really going on in the context of an executed transaction. dynaTrace can visualize a single transaction in a sequence diagram. There are different zooming options available. For our purpose we are interested in the actual remote communication interactions between two service instances.
This is a visualized sequence diagram of the service interactions between the frontend and the trader service:
Seems there are many roundtrips for the web page request that displays my current portfolio. The Frontend has to make 10 roundtrips to the Business Service.
Step 4: Analyze Application Layers
A distributed application is made up of services – but it is made up of even more application layers. There is a layer responsible for data access, a layer that implements the business logic, a layer that offers remote communication and a layer that implements the frontend visualization. dynaTrace automatically identifies application layers and is able to visualize in which Layers (dynaTrace calls APIs) most of the time is spent:
You can see from the API Breakdown view that dynaTrace analyzes all different layers of all different components (Java and .NET). The layers include out-of-the-box identified layers like .NET WCF, Servlet, ADO.NET and JDBC. It also contains custom defined layers like Stonehenge or MSTrade which include the custom business logic code of the Java and .NET Implementation of the Stocktrader application. This breakdown view allows me to quickly analyze where most of the time is spent. I can either analyze this layer breakdown for a set of transactions or for individual transactions.
Step 5: Analyze transaction flow and identify root cause of problems
PurePath technology traces every single transaction that is executed against Stonehenge from the frontend through all involved services (Java or .NET). A transaction in our case starts at the ASP.NET Frontend Service when the end user browses through the pages.
The PurePath in the screenshot shows us the complete execution path of an order transaction. The transaction starts at the ASP.NET Frontend which calls to the Business Service via WCF. This service inserts a record to the database and then calls the Order Service via WCF which is updating account, quote and order information. The color-code of the individual methods indicates the overall performance contribution of the individual method to the overall transaction. On each of the nodes in the PurePath we can see additional information that provides more context information. This could be HTTP Parameters and HTTP Session Information on the ProcessRequest node or the name of the WCF Contract Method that was invoked via WCF. For every method we also get the time it took to execute, the time actually spent on the CPU, time spent in synchronization or wait blocks and time the method was suspended by the Garbage Collector.
Seeing the complete execution trace of a transaction allows us to identify which components have actually been called in order to fulfill a request and where time was spent. It actually seems that the guys at Microsoft and Sun built a nice “waiting” method into their implementation to better simulate “under load” processing time when stock selling and purchasing orders come in. With dynaTrace I can immediately see the hot path of a single transaction which brings me to a method that takes most of the time when handling the order:
Step 6: Analyzing configuration issues
Particularly in distributed systems, the configuration of the interoperability layer can be a problem. You may end up calling wrong service endpoints or the configuration lacks of missing values. The following PurePath shows “hidden” exceptions that are not logged out to log files indicating a configuration problem. The problem itself may not manifest itself as a functional problem – but it definitely produces overhead due to exception handling.
Interoperability is a great thing – especially when the two major players in the field (Microsoft and Sun) provide the technical base for connecting Java and .NET components. Working with a heterogeneous system and dealing with the day-to-day issues is a problem that a solution like dynaTrace solves by providing:
- Real transaction trace & capture – every transaction, end to end Always-on in every business critical application – 24x7x365
- Pinpoint to code-level – issues causing application problems/bottlenecks and Provide true visibility of dynamic behavior of app, incl. 3rd party code, without source
- Full lifecycle approach – development to test to staging to production, and back
- Integrated collaborative framework - for common language & easy communication
- Open & Transparent – easy integration to enhance existing processes and tools with dashboards providing visibility for all stakeholders, incl. business managers