Bon-Ton operates 273 stores in 23 states of the US and offers online shopping through fifteen active WebSphere application servers hosted in two eCommerce data centers. 85% of its online eCommerce content is served through Akamai which is smart and necessary in order to serve the 25 million requests they see on their web site per day. The challenge they had is that they were always forced to react to user complaints about bad performance or problems on their eCommerce site instead of being proactive and preventing these problems. They were in reactive mode because they didn’t have visibility into Akamai and only saw performance from inside their data centers, which didn’t highlight any performance problems end users might experience due to a problem in a CDN.
It also took about 4 hours until they got log files from Akamai to start triaging a problem. After this amount of time, they could already have lost substantial revenue and seen public complaints on Twitter that their website should be avoided for shopping.
In this blog post we discuss how Bon-Ton managed to lower their problem resolution time from several hours to minutes. They knew that their old server-side only monitoring approach was not going to work any longer as they only got 15% of the overall traffic to their data centers. For that reason they evaluated different User Experience Management products. Most of these solutions failed to deliver the data they needed – which was live access to problems affecting their end users with the ability to pinpoint which content delivered from which CDN or 3rd party provider. With their current implementation they achieved all these requirements and are now able to solve problems in minutes instead of waiting for 4 hours to start triaging. They managed to transform from reactive problem troubleshooting to proactive problem prevention.
Why Don’t Server-side-only Solutions Work?
Bon-Ton identified several issues when they evaluated server-side only performance management solutions:
Only Seeing 15% of the Traffic
The most obvious problem was that Bon-Ton was blind to all traffic delivered by Akamai and other 3rd party providers which left them with only 15% of the real traffic; and with that, they were blind to 85% of potential problem hotspots. To solve this problem they required a solution that sits in the browser to capture everything that is really loaded by an end user. Just periodically pulling data from Akamai also won’t solve the visibility problem as they also rely on other 3rd party providers that don’t make their performance data accessible.
Incorrect Landing Page Detection
Landing pages are very important and are very often used for marketing activities, e.g.: special offers or product promotions. In order to track the success of these activities it is necessary to know which landing page a visitor arrived at. Knowing the landing page allows marketing people to figure out how successful their campaigns are when looking at the conversion rates per landing page.
As most of Bon-Ton’s landing pages are cached on a CDN to provide fast content delivery they were blind to these landing pages when doing server-side monitoring only. The landing pages they identified were then typically pages that were not connected with adding items to the shopping cart or checking out. At that point in time they could not determine what a visitor’s actual landing page was, affecting their campaign tracking statistics.
Incorrect Visit Detection
Bon-Ton is also interested in the number of visits to their online stores. Not seeing all traffic makes that challenging on its own when relying on server-side data only. The next challenge is if you have a typical visitor whose click path goes through multiple pages, where some of them are cached and some of them are not. Some pages might normally be cached but expired on Akamai so they needed to be refreshed. In that case solutions that try to “string together” the visitor click path based on referrer headers will end up splitting a single visitor into multiple visits as they can’t tell when a visitor is going from one cached page to another.
Once again, conversion and usability statistics are distorted. The answer for this is a solution that sits in the browser, as you need to see every interaction in order to understand what each individual visitor did, so you can account for them accurately.
Getting Insight into What Really gets Served by Akamai
The following screenshot shows the user interactions of one single visitor on their web site. Each line represents an end user action such as loading a page, clicking on a link or submitting a form (through HTTP POST or XHR). All except 3 actions were fully serviced by Akamai. You can tell that by looking at the Server-Time column which shows that the application server was not involved in handling that request. Also interesting to see are the load times of each of the pages that were fully served by Akamai – such as the landing page which took 6.5s to load:
Having that type of visibility already solves all 3 shortcomings of the server-side only solutions they looked into. They see all traffic, not just 15% of it. They can correctly identify their important landing pages and they are also sure to see each unique visitor and with that get more accurate data about number of visits, the number of interactions each visitor does, and so on.
Identifying Akamai-related Performance Problems
Compuware dynaTrace UEM automatically detects content delivered by 3rd party domains – including CDNs such as Akamai. There are also out-of-the-box dashboards that show you whether 3rd party content is impacting page load performance. The following dashboard from Bon-Ton shows a nice spike in image downloads from Akamai giving them a heads-up that something that outside of their data center was impacting end user experience. It also shows that this spike was not related to unusually high load as the bottom chart represents the number of actions executed by the site visitors at that time:
Drilling into that timeframe and checking the slow requests makes it easy to identify the problematic request and the problematic images. First we can look at the Transaction Flow that shows us which components and services have the major impact on one of the request that shows this spike. It is clear to see that all of this time can be contributed to a single CDN domain:
The next question is which content/images are the reason for this spike. As Compuware dynaTrace UEM captures these details it is easy to identify the actual image files that took unusually long to load:
Conclusion: Make sure you get visibility into your CDN
Just like other web companies, Bon-Ton uses a CDN to speed up their content delivery. If you are interested in more of their lessons learned when it came to performance management you can listen into their recently recorded webinar.