Andreas Grabner About the Author

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Real Life Ajax Troubleshooting Guide

One of our clients occasionally runs into the following problem with their web app: They host their B2B web application in their East Coast Data Center with their clients accessing the app from all around the United States. Occasionally they have clients complaining about bad page load times or that certain features just don’t work on their browsers. When the problem can’t be reproduced in-house and all of the “usual suspects” (problem with internet connection, faulty proxy, user error, …) are ruled out they actually have to fly out an engineer to the client to analyze the problem on-site. That’s a lot of time and money spent to troubleshoot a problem.

Capturing data from the End User

In my recent engagement we had to deal with one of their clients on the west coast complaining that they can no longer login to the application. After entering username and password and clicking the login button, the progress indicator shown while validating the credentials actually never goes away. The login worked fine when trying it in-house. The login also worked for other user in the same geographical region using the same browser version. They run dynaTrace on their Application Servers that allowed us to analyze the requests that came from that specific user. No problems on the server side could be detected. So we ruled out all potential problems that we could identify from within the Data Center. Instead of flying somebody to the West Coast we decided to use a different approach. We asked the user on the West Coast to install the dynaTrace Browser Agent. The Browser Agent captures similar data as the dynaTrace Ajax Edition. The advantage of that agent is that it automatically ties into the backend. Requests by the browser that execute logic on the app server can be traced End-to-End, from the Browser all the way to the database.

dynaTrace Timeline showing Browser (JavaScript, Rendering, Network) and Server-Side Activity (Method Executions and Database Statements)

dynaTrace Timeline showing Browser (JavaScript, Rendering, Network) and Server-Side Activity (Method Executions and Database Statements)

The Timeline View as shown above gives us a good understanding on what is going on in the Browser when the user interacts with a page. Drilling into the details lets us see where time is spent, which methods are executed and where we might have a problem/exception:

End to End PurePath that shows what really happens when clicking on a button on the web page

End to End PurePath that shows what really happens when clicking on a button on the web page

Why the Progress Indicator didn’t stop

In order to figure out why the progress indicator didn’t stop spinning and therefore blocking the UI for this particular user we compared the data of the user that experienced the problem with the data from a user that had no problems. From a high-level we compared the Timeline Views.

Identifying the general difference by comparing the two Timeline Views

Identifying the general difference by comparing the two Timeline Views

Both Timelines show the Mouse Click which ultimately results in sending two XHR Requests. In the successful case we can see a long running JavaScript block that processes the returns XHR Response. In the failing case this processing block is very short (some ms). We could also see that in the failing case the progress indicator was not stopped as we can still observe the Rendering Activity that updates the rotating progress indicator.

In the next step we drilled into the response handler of the second XHR Request as that’s where we saw the difference. It turned out that the XHR Response was an XML Document and the JavaScript handler used an XML DOM Parser to parse the response and then iterating through nodes that match a certain XPath Query:

JavaScript loads the XML Response and iteratres throug the DOM Nodes using an XPath expression

JavaScript loads the XML Response and iteratres throug the DOM Nodes using an XPath expression

The Progress Indicator itself was hidden after this loop. In the successful case we saw the hideProgressIndicator() method being called, in the failing one it wasn’t. That brought us to the conclusion that something in the above listed load function caused JavaScript to fail.

Wrong XML Encoding caused Problem

dynaTrace not only captures JavaScript Execution but also captures Network traffic. We looked at the two XML Responses that came back in the successful and failing case. Both XML Documents were about 350k in size with very similar content. Loading the two documents in an XML Editor highlighted the problem. In the problematic XML Document certain special language characters – such as German Umlauts – were not encoded correctly. This caused the dom.loadXML function to fail and exit the method without stopping the progress indicator.

Incorrect encoding of umlauts and other special characters caused the problem in the XML Parser

Incorrect encoding of umlauts and other special characters caused the problem in the XML Parser

As there was no proper error handling in place this problem never made it to the surface in form of an error message.

Conclusion

To troubleshoot problems it is important to have as much information at hand as possible. Deep Dive Diagnostics as we saw it in this use case is ideal as it makes it easy to spot the problem and therefore allows us to fix problems faster.

Want to know more about dynaTrace and how we support Web Performance Optimization from Dev to Prod? Then check out the following articles


Comments

  1. what a nightmare ;-)

  2. Any idea how did the XML characters not get encoded properly for one of the users. I wonder what that has to do with the other user being in a different geographic region?

  3. I wonder why on earth did u need to fly an engineer to client site to debug this whole damn thing. couldnt it be done by remote access to client machine?
    This is the pity about Software Engineering.

  4. @Himanshu:encoding wasnt done probably for all users. Problem with that user was that they recently updated their company name in the system name to reflect the correct spelling (including umlauts)
    @Amit:flying out was sometimes necessary because of security restrictions

Comments

*


7 − = four