Filling Citrix Visibility Gap with Application-Aware Network Performance Monitoring – Part 1: Load Balancing
There are many reasons why companies choose Citrix XenApp or XenDesktop. Citrix can lower the costs of desktop management and simplify access to hosted applications from any device. Citrix seems like an ideal solution but it introduces certain challenges to application performance monitoring (APM). With Citrix we get yet another component in the data center to take care of. In addition, the virtualized application delivery becomes “black boxed” from the APM perspective. It gets harder to isolate the fault domain to either the network, infrastructure or applications themselves.
The National Bank of San Borodin (NBSB – name changed for commercial reasons) decided to expand its reach beyond the small island of San Borodin and make its presence in every city on the Canary Islands. Instead of investing in the IT equipment for every banking agent, NBSB decided to implement the Bring Your Own Device (BYOD) strategy. The IT team chose Citrix XenApp as the preferred way to deliver banking applications to the banking agents. Ensuring for all banking agents good quality of service of applications delivered through Citrix XenApp imposed additional challenges on the Operations team.
In this two-part series we will analyze how to discover and handle some performance issues which can happen in the Citrix enabled environment. We start by looking at how to monitor network performance to discover potential problems with load balancing. In the second part we will look into managing experience of Citrix users from the perspective of network monitoring.
The Citrix Black-Box Application Performance Management
When implementing application performance management we focus on many aspects of the monitored solution: infrastructure, network and application itself, usually in the context of the end user experience. The more complex and distributed the application is the bigger is the potential impact of the network on the end user performance. On the other hand, simply looking at the quality metrics of the network is not enough. In order to avoid tedious war room scenarios managing application performance requires a convergence between application-aware network performance monitoring (aaNPM) and network-aware application performance management (naAPM).
Figure 1. Overview architecture of Citrix-enabled applications ↩
Managing performance of applications delivered to the end users through Citrix application delivery virtualization (see Figure 1) poses a number of challenges in addition to managing another key piece of infrastructure (Citrix farm). We can identify four of such challenges for managing performance of Citrix-enabled applications: visibility, capacity and load, application performance and end-to-end user monitoring.
Although it is still possible to monitor performance of the backend applications, it becomes virtually impossible, using only typical NPM, to correlate user experience with the detailed state of the applications and infrastructure, including the Citrix farm.
Compared to other remote desktop protocols Citrix XenApp/XenDesktop requires less bandwidth to deliver applications. Although that holds true for some applications, not all applications are equally conservative when it comes to bandwidth consumption. Rendering multimedia, printing or accessing local USB storage are just some examples of channels that can consume the lion share of the bandwidth and seriously affect performance of other, sometimes more important applications. Managing performance of Citrix-enabled applications requires you to decode and correlate the usage of these channels with the actual user sessions.
Citrix Performance From the Network Perspective
The quality of service offered by Citrix XenApp/XenDesktop virtualizations heavily relies on the quality of the network: you need to have a good network infrastructure to take full advantage of what Citrix can offer. Therefore apart from monitoring the infrastructure for CPU, memory or disk utilization problems, we should ensure reliable network. Following metrics should be monitored for potential network problems that can affect user experience: realized bandwidth, round trip times (RTT), retransmission rate, network throughput, bandwidth usage (per user location, application and channel) and Citrix client-server connectivity issues, e.g., zero windowing events.
Figure 2 shows example Citrix-enabled architecture with an application-aware network performance monitoring (aaNPM) tool that is capable of analyzing the ICA protocol, understanding user sessions, seeing inside Citrix farm with Thin Client Analysis Module (TCAM) and monitoring the actual backend applications.
Figure 2. End to end visibility gained with an aaNPM tool featuring TCAM component and ICA protocol decode ↩
Once we are able to monitor the performance of the Citrix Farm and the rest of the infrastructure, including network, application servers, etc. we can isolate the domain of the problem to either the Citrix infrastructure, the network, or the actual application delivered through Citrix virtualization.
In following sections, based on examples from the National Bank of San Borodin, we will analyze how to determine problems caused by load balancer and how they can affect performance of Citrix-enabled applications. We will also show how to determine problems caused by certain Citrix channels or Citrix-enabled applications.
Load Balancing Citrix Infrastructure
In order to ensure high performance of delivered applications it is often required to distribute the load among multiple servers delivering the same applications. Properly configured load balancer prevents one application servers from becoming overloaded by redirecting traffic to other application servers. Load balancers improve application availability and responsiveness by improving server utilization. Therefore load balancing is the most straight forward method of scaling applications.
Figure 3. Architecture of Citrix-enabled application virtualization with load balancing ↩
The same principle applies to applications virtualized through Citrix XenApp or XenDesktop (see Figure 3) where multiple users connect to a farm of Citrix servers that in turn needs to efficiently deliver applications hosted on the backend application servers. However, proper load balancing of applications delivered with Citrix may pose additional challenges. Commonly used load balancing algorithms, such as source IP hash, URL hash or domain hash might not be best suited to distribute load among applications and users with widely divergent demand on infrastructure resources.
Moving Over to Citrix
The Operations team of the National Bank of San Borodin decided initially to serve only remote banking agents through Citrix delivery virtualization. Employees working at the main office on the San Borodin Island continued accessing the bank services directly.
The company started to see complaints as soon as the first batch of agents started using services delivered with Citrix. The team’s initial assumption was that the cause of the problem was due to the Citrix deployment. But, detailed analysis could not confirm that the problems could be attributed to Citrix. The complaints were voiced by both the users accessing bank services via Citrix and those accessing these services directly.
Further investigation using the aaNPM tool (see Figure 4) revealed that the load balancer was not reconfigured to match new characteristics of the traffic: the new users using Citrix XenApp were in fact seen by the load balancer as just one user, as they were all coming through the same Citrix XenApp server, i.e., the same IP address.
The XenApp server can serve over 100 of users, provided the applications they use can leverage Citrix virtualization delivery (we talk about applications that may decrease that estimate in the next part). However, balancing load of these users on par with users accessing bank applications directly led to overload of one of application servers and eventually decreased end user experience.
Figure 4. Checking performance per affected sites shows that there is one site delivered through Citrix, while the other are just regular users connecting directly to the application server. The Citrix server requires much more resources than a regular single-user (banking agent) site. ↩
Once the load balancer settings were changed from balancing based on the IP address to balancing based on the session the problem was resolved.
Balance the Load Not the Number of Users
Citrix can also be setup to distribute the load across XenApp server farms using NetScaler appliance.
When the Operations team at NBSB learned that some users were experiencing performance problems they consulted the aaNPM tool. Citrix servers were properly balanced by the users count (see Figure 5) and metrics such as network performance or availability did not indicate any problems. The server realized bandwidth, however, was below baseline and indicated that server 10.0.0.1 could be heavily overloaded. At the same time the realized bandwidth for the 10.0.0.2 server was almost 3 times higher than for 10.0.0.1.
Figure 5. Balancing load only by number of users might lead to almost 3x difference in realized bandwidth ↩
This example shows that looking only at network performance or user count might not be enough to properly assess and balance the load for the Citrix servers. Sometimes we might not even see problems when monitoring the ICA protocol. Therefore, it is important to be able to correlate consumption of the physical resources, such as CPU or memory, on the Citrix servers. Figure 6 shows the remaining part of the report shown in Figure 5, this time with focus on metrics gathered through the Thin Client Analysis Module (TCAM): CPU, memory and disk utilization. For the server 10.0.0.1, which received most of the load, we can see increased CPU and memory utilization (see Figure 6).
Figure 6. Using TCAM metrics to look into consumption of physical resources at the Citrix servers reveals heavy memory and CPU consumption on the 10.0.0.1 server as the cause of the poor user experience seen through realized bandwidth metric in the previous Figure ↩
Citrix XenApp/XenDesktop is a solution used to deliver the same user experience to multiple users who might be using totally different devices, ranging from desktop computers to tablets. There is, however, a catch. Nowadays, when broadband or LTE connection are commodities, we might still accept a web page that loads 200ms slower than usual, but seeing a cursor tracking your mouse movements with 200ms delay is unacceptable. With Citrix virtualized delivery the Operations team has just gotten a new challenge in application-aware network performance monitoring (aaNPM).
The Operations team at the National Bank of San Borodin had to ensure high network performance, both in the data center and between the Citrix thin client and servers. The team used Compuware APM Application-Aware Network Monitoring to keep an eye on extensive bandwidth usage of some Citrix channels or usage of secondary applications. Since Citrix does not take load from the actual backend application servers, the team had to ensure proper distribution of load over both Citrix servers and the backend servers.
When performance problems, that are visible as uneven load balancing, cannot be easily attributed to improper configuration of load balancer, we should also analyze if there are any applications delivered through Citrix that affect the overall performance of other applications. We will discuss such problems in the next part of this series.
(This series is based on materials contributed by Pieter Van Heck, Krzysztof Ziemianowicz, Roger Boyd, Suneel Dhingra and Patrik Bohland, based on original customer data. Some screens presented are customized while delivering the same value as out of the box reports.)