Week 2 – The many faces of end-user experience monitoring
Inspired by a comment of Wim Leers on one of our other posts on web performance, I decided to switch plans and write this week about end-user experience monitoring. If you google end-user experience monitoring you will find a number of different approaches.
End- user experience – as I think we all agree – is the performance perceived by the actual user at a specific point in time. Sounds simple, but is in reality not that easy to ascertain. End-user monitoring is also often referred to as the last mile in monitoring (somehow reminds me of the movie with Tom Hanks). This is mainly because it is really complex and difficult to determine. In fact the closer you get to your user the harder it gets. Additionally you suddenly have to monitor thousands and thousands of users. This means also that you can only send a limited amount of data as otherwise you would need be able to process the amounts of information. Furthermore this information has to be sent over the wire. Sending lots of performance data from your end users won’t make them happy either.
Before looking at the different approaches, let’s first discuss what information we really want to track.
Instead of looking at the data, let’s discuss which questions we want to get answered by end-user monitoring. These questions will bring us to the required data anyway. First we need to get monitoring information about general user experience:
- How long did it take to load the page?
- Where there any problems on the page?
- How long did certain actions take on a page?
Secondly, in case of problems we want to get additional information which helps us to understand and diagnose them such as:
- What was the reason for a slow download – the network or the server?
- What did the user do that caused a problem?
- What browser, operation system and connection was used?
The first question can be answered by tracking resource load times. Here we want to measure the follwing metrics:
- Time of First Byte – When was the first byte of the web page received
- Time to First Visual – When was the page content visible the first time. This is the first time a drawing operation occured
- Time to On Load – When was the onLoad event of the page executed
The network times should be split up into wait time (the delay until a browser connection available), DNS lookup time, transfer time and server time. This is especially useful in diagnosing network related problems. Further we want to see HTTP Headesr to find improper caching configuration or problems like HTTPS related connection problems.
Having an understanding of the data we want to collect, let’s now look at the various technologies which are used to collect end-user experience metrics.
Synthetic Transactions are based on the concept that they emulate real users. They use pre-recorded scripts defining end-user behavior that are executed at defined intervals. Providers of these tools execute these scripts from up to more than one hundred locations worldwide. We at dynaTrace provide a script-based monitoring plug-in for the same purpose. While they are not really monitoring the perception of real end-users, they constantly monitor the performance of specific pre-defined transactions. Transactions, however, have to be “read-only” as they would otherwise kick off for example real purchase processes. This limits the usage to a certain subset of your business critical transactions. If you create a kind of dummy user in your application whose transactions will be filtered out later in the process, you might be able to monitor more transactions.
The key point here is that synthetic transaction monitoring is not about the performance perceived by real users of your application. It rather acts as a reference measurement that will help to show performance degradation, detect networking problems or provide notificatons in case of errors.
Network Sniffing is the second category of end-user monitoring tools. Unlike synthetic transactions they rely on real end-user traffic. Special appliances are used which monitor the whole network traffic being sent between clients and web servers. These appliances however are now within your own network, meaning they are farther away from your end users. On the other hand you can use them to monitor real end-user traffic. They analyze end-user traffic in real time, verify response times against SLAs and also check content for correctness. These monitoring solutions also provide the actual HTML seens by your end users and enable you to follow the click path of your users.
Instrumentation at Browser Level
The challenge is then to deliver these results back a centralized monitoring server. Episodes by Steve Souders suggests the use of beacons. Beacons are small web requests with piggybacked monitoring information. Alternatively XHR requests can be used to communicate with a monitoring server. Both approaches however use browser connections which are also required by the application itself. Communication should therefore be kept to a minimum with small payload only.
On the server-side the challenge is to process this potentially huge amount of data. Thousands and thousands of clients sending small packets still leads to huge amounts of data. Specific requirements of Web 2.0 application impose another challenge here. Plain page timing metrics might not be enough for Google Mail and other single-page highly-interactive applications. Monitoring the response times of XHR requests is essential to understand the communication behavior of the application. The ZK framework’s performance monitor for example, offers such capabilities.
Browser Plug-Ins and Extensions
The Role of Server-Side Monitoring
Although this post is about end-user experience, I want to include some words on server-side monitoring as well. One could argue that this the approach is the furthest away from the end user. This is true; however, application performance monitoring at the server side also provides insight into end-user performance. Very often server-side monitoring is combined with end-user monitoring. The image below show an example of an integrated monitoring view from active monitoring with server-side metrics. Problems having their root cause on the server side can only be efficiently diagnosed using server-side monitoring and diagnosis data.
What will the future bring us?
The future will be first about integrated tool chains combining information from many sources in a seamless way. Monitoring at the browser level will be done using more and more sophisticated approaches. Hopefully browser developers will open the functionality they currently only provide via plug-ins and also allow the development of standardized extensions based on a unified API. A first draft of a unified interface is the Web Timing Working Draft.
This post is part of our 2010 Application Performance Almanach.