Identify Performance Bottlenecks in your BizTalk Environment – Part I
Microsoft BizTalk enables companies to integrate and automate their business process (BPM). In a BizTalk environment messages are picked up by adapters and put through a robust message infrastructure where an orchestration engine allows you to implement your business processes. Along the way – a message takes – there are several points where performance of the overall message processing can be negatively impacted and therefore impacts your business.
I plan a series of blog entries on how to Identify Bottlenecks in BizTalk. I will give you background information on the BizTalk Environment – links to other interesting posts and MSDN articles and I will show you how you can follow a single message through your BizTalk system and identify where and what the problems are in case you have a problem. Lets get started:
An Overview of BizTalk
Messages processed by BizTalk follow a certain path involving different components. The following image – taken from BizTalk Message Page - does a good job in illustrating the message flow through BizTalk:
The major players in the message flow are:
1. A message is received through a receive port and handled by a configured adapter, e.g.: File, FTP, HTTP, SOAP, SQL, …
2. The receive pipeline processes each message and can perform operations like decryption, signing, …
3. Optionally receive ports transform a message via mapping to a different format
4. The message is put into the MessageBox which resides on a SQL Server database. Subscribers (Orchestration or Send Ports) are notified
5. Orchestration picks up a message and executes logic to support your business processes
6. The message (processed by orchestration or not) can be transformed into a different output format before sent via mapping
7. The send pipeline can perform certain operations like encryption on the message before generating the final output format
8. The send port uses the configured adapter to transmit the message to the next system
(Too) Many ways to identify Performance Bottlenecks
There are several potential bottleneck areas like the Operating System, the File System, the Database, BizTalk Server, the Adapters, The Pipeline, Message Mapping, Orchestrations, Message Endpoints, … Check out the BizTalk Performance Optimization Guide and read the chapters about Finding and Eliminating Bottlenecks to get a better understanding about the individual components in a BizTalk Environment and what can potentially go wrong.
The Performance Guide gives great suggestions about which tools to use to analyze performance counters, log files, orchestration, I/O. The problem with that is that you need a bunch of tools that analyze different data sources and in the end it is you to put together all the pieces and try to correlate the output of the different tools. So – when message processing slows down you need to analyze the performance counters, analyze the log files, analyze profiler output, … – all in different tools.
Using all these tools is doable – but it is not fun – nor is it efficient. In this blog series I will show you how do analyze all this data with a single performance management solution.
Step 1: Monitoring BizTalk Host Instances via Windows Performance Counters
A very interesting set of counters are the Host Throttling Performance Counters. These counters not only provide information about message throughput but also provide indicators when high-water marks are reached. Check out High Message Delivery Rate, High Database Size, High Thread Count or High Process Memory. These counters should always return 0 (zero). In case a high-water mark is reached the counter flips to 1 alerting you that the Host Instance is experiencing throughput problems.
The two counters Message Delivery Incoming Rate and Message Deliver Outgoing Rate tell you how many messages have been passed to the Orchestration or Messaging System and how many of them have actually been processed. There are two similar counters – Message Publishing Incoming Rate and Message Publishing Outgoing Rate – which indicate how many messages have been put to the MessageBox database and how many have been pulled out. In an ideal world the incoming and outgoing numbers should match. If you see a gap you know that either the Orchestration Engine, the Message Engine or the MessageBox cannot handle the number of incoming messages.
Monitoring these counters
You can use your own windows performance counter monitoring tool, System Center Operations Manager (SCOM) or just go with Windows Performance Monitoring tool that comes with every windows installation.
I use dynaTrace as it allows me to monitor all different types of data sources out of the box (Windows Performance Counters, Unix System Monitor, SNMP, …). With its plugin concept dynaTrace can be extended to any type of data source. Thanks again to the great work of our partner company MCG from Denmark who make BizTalk Monitoring easier. After MCG contributed the Apache Monitoring Plugin they now also created the BizTalk Monitor that you can download from the dynaTrace Community Portal. This monitor is the first key to manage your BizTalk Environment. The Monitor queries all relevant performance counters from a BizTalk Host Instance such as Message deliver rates and delay times. You create one monitor for each Host Instance in your System Profile.
In my BizTalk environment I have two BizTalk Host Instances. The following screenshot shows a dynaTrace Dashboard charting the result of the two configured BizTalk Monitors. The dashboard shows me that one of my host was obviously rather busy (BizTalkServerApplication) – the other one kind of lazy (dynaTraeApplication1):
The above dashboard alerts me that I had a High message delivery rate (this is the value that should be 0 all the time). I also see that I actually had a delay of up to 40ms in message delivery at a time where we processed about 20 messages/second. At the same time when the delay happened we see that we had a difference between incoming and outgoing messages (Message Deliver Rate). This means that my Orchestration Engine could not handle the amount of messages processed at this time.
The dashboard visually alerts me with the red X. dynaTrace also allows me to define alerting actions such as sending me an email or publishing this alert to SCOM.
More performance counters
Additionally to these counters you should also collect measures for memory consumption, network throughout, I/O, handle and thread count, CPU, … All these counters give a great initial overview of the system and how well it performs. Here is another screenshot that contains some of these counters:
Correlating the values we see on this dashboard we see that – at the same time we had the problem with the message delay in our Orchestration Engine we had a spike in the .NET Garbage Collector and a drop in the handle count. With dynaTrace I am also able to see that we had several exceptions – some related to the networking. We will dig deeper into this data in the next blog post. But as you can see – by looking at the available performance counters – both from BizTalk as well as those we get for windows processes and .NET Applications we can identify the problematic areas in our installation. In my case it seems to be the Orchestration Engine that causes message delays when we have more than 20 messages / second.
There is another great Troubleshooting guide for MessageBox Latency Issues on MSDN. Check it out.
Next Steps …
In the next post I will show how to go beyond performance counters (as they only give you hints about where the problem is) and focus on problems in adapters, pipelines, orchestration and message endpoints. We also learn how to analyze send ports and how to trace a single message through the BizTalk environment. Tracing a single message through all the different stages and components is Key to get to the root cause of message processing problems. Stay tuned …