Andreas Grabner About the Author

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Ensuring Web Site Performance – Why, What and How to Measure Automated and Accurately

It is a fact that end user response time is critical for business success. The faster web pages are perceived, the longer users tend to stay on the page and therefore spend more money and drive business.In order to ensure that end user response times are acceptable at all times it is necessary to measure the time in the way the end user perceives performance. Measuring and monitoring your live system is important to identify problems early on before it affects too many end users. In order to make sure that web pages are fast from the start it is very important to constantly and continuously measure web page performance throughout the development phase and in testing. There are two questions that need to be answered

  • What is the time the user actually perceives as web response time?
  • How to measure it accurately and in an automated way?

What time to measure? Technical Response Time vs. Perceived Response Time

Technically – the response time of a web page is the time from the first byte sent by the browser to request the initial document until the last byte of all embedded objects (images, JavaScript files, style sheets, …) was received. Using network analysis tools like HTTP Watch or Fiddler one can visualize the individual downloads in a timeline view. The following illustration shows the network timeline when accessing Google Maps (http://maps.google.com) with an empty browser cache using Fiddler:

Network Timeline showing Network Requests but no Browser Activities

Network Timeline showing Network Requests but no Browser Activities

The initial document request returned after 1.6s. Embedded objects get downloaded after the initial document was retrieved. It turns out there are 2 additional HTML documents, a list of images and some JavaScript files. After 5 seconds (when main.js was downloaded) we see a small gap before the remaining requests are downloaded. We can assume that the gap represents JavaScript execution time that delayed loading some other objects– but we cannot be fully sure about that.

From this analysis it’s hard to tell what the perceived end user response time really is. Is it 1.6 seconds because that is the time when the browser could already start rendering the initial content of the HTML document? Or is it roughly 5 seconds when the first batch of embedded objects was fully downloaded? Or might it be 8 seconds – because that is the time till the last request was completed? Or is the truth somewhere in between?

There is more than meets the “HTTP Traffic” Eye

The browser does much more than just download resources from the server. The DOM (Document Object Model) is built and maintained for the downloaded document. Styles are applied to DOM Elements based on the definition in Style Sheets. JavaScript gets executed at different points in time triggered by certain events, e.g.: onload, onclick, …. The DOM and all its containing images are rendered to the screen.

Using a tool like dynaTrace AJAX Edition we get all this additional activity information showing us where and when additional time is spent in the browser for JavaScript execution, Rendering or waiting for asynchronous network requests. We also see page events like onLoad or onError:

Timeline of all Browser Activities

Timeline of all Browser Activities

Looking at this timeline view of the same Google Maps request as before now tells us that the browser started rendering the initial HTML document after 2 seconds. Throughout the download process of the embedded objects the browser rendered additional content. The onLoad event was triggered after 4.8 seconds. This is the time when the browser completed building the initial DOM of the web page including all referenced objects (images, css, …). The execution of main.js – which was downloaded as last the JavaScript file – caused roughly 2 seconds of JavaScript execution time, causing high CPU on the browser, additional network downloads and DOM manipulations. The High CPU utilization is an indication of the browser not being very responsive to user input via mouse or keyboard as JavaScript almost exclusively consumed the processor. DOM Manipulations executed by JavaScript got rendered after JavaScript execution was completed (after 7.5s and 8s).

So what is the perceived end user performance?

I believe there are different stages of perceived performance and perceived response time.

The First Impression of speed is the time it takes to see something in the browsers window (Time To First Visual). We can measure that by looking at the first Rendering (Drawing) activity. Get a detailed description about Browser Rendering and the inner workings the Rendering Engine at Alois’s blog entry about Understanding Browser Rendering.

The Second Impression is when the initial page is fully loaded (Time To OnLoad). This can be measured by looking at the onLoad event which is triggered by the browser when the DOM is fully loaded meaning that the initial document and all embedded objects are loaded.

The Third Impression is when the web site actually becomes interactive for the user (Time To Interactivity). Heavy JavaScript execution that manipulates the DOM causes the web page to become non interactive for the end user. This can very often be seen when expensive CSS Selector Lookups (check out the blogs about jQuery and Prototype CSS Selector Performance) are used or when using dynamic elements like JavaScript Menus (check out the blog about dynamice JavaScript menus).

Let’s look at a second example and identify the different impression stages. The following image shows a page request to a product page on a very popular online retail store:

3 Impression Phases

3 Impression Phases

The initial page content is downloaded rather quickly and rendered to the screen in the first second (First Impression). It takes a total of about 3 seconds for some of the initial images to load that make up the pages initial content (Second Impression). Heavy JavaScript that manipulates the DOM causes the page to be non responsive to the end user for about 10 seconds also delaying the onLoad event where the page delay loads most of the images. In this case the user sees some of the content early on (mostly text from the initial HTML) – but then needs to wait another 10 seconds till the remaining images get delay loaded and rendered by the browser (Third Impression). Due to the high CPU usage and DOM manipulations the page is also not very interactive causing a bad end user perception of the pages performance.

How to measure? Stop Watch Measuring vs. Tool Supported Measuring

The idea for this blog post came from talking with performance testing engineers at on of our clients. I introduced them to the dynaTrace AJAX Edition and was wondering about a small little gadget they had on their table: a Stop-Watch.

Their task was to measure end-user response time for every build of their new web-site in order to verify if the times are within defined performance thresholds and in order to identify regressions from build to build. They used the Stop-Watch to actually measure the time it took to load each single page and to measure the time till the page was responsive. The “manually” measured numbers were put into a spreadsheet which allowed them to verify their performance values.

Do you see the problems in this approach?

Not only is this method of measuring time very inaccurate – especially when we talk about measuring precise timings in tenths of seconds. Every performance engineer also has a slightly different perception of what it means for the site to be interactive. It also involves additional manual effort as the timing can only be taken during manual tests.

Automate measuring and measure accurately

The solution to this problem is rather easy. With tools like dynaTrace AJAX Edition we capture the performance measures like JavaScript execution, Rendering Time, CPU Utilization, Asynchronous Requests and Network Requests. Not only is this possible for manual tests but also works in an automated test environment. Letting a tool do the job eliminates the inaccuracy of manual time taking and subjective perception of performance.

When using the dynaTrace AJAX Edition as seen on the examples above all performance-relevant browser activities are automatically captured and enable us to determine the time of the 3 Impression Stages. The blog article “Automate Testing with Watir” shows how to use dynaTrace AJAX Edition in combination with automated testing tools. The tool also provides the ability to export captured data to XML or spreadsheet applications like Excel – supporting the use case of automated regression analysis across different web site versions/builds.

Conclusion

Using tools like dynaTrace AJAX Edition for Internet Explorer, YSlow or PageSpeed for FireFox or DevTools for Chrome enables automating web site performance measuring in manual and automated test environments. Continuously measuring web site performance in the browser allows you to always focus on end user performance which in the end determines how successful your website will be.


Comments

  1. Now monitor large multi-vendor networks and get real-time notifications on outages by trying fully featured, 30-days free trial of Traverse…
    http://zyrion.com/download/

  2. beautiful post,
    How ironic – i am with a customer right now trying to solve performance issues caused by JS that runs during the page rendering and causes such gaps. Guess what tool I use? your AJAX profiler ;)
    Thank you for great tool and fantastic content.

  3. Alois Reitbauer Alois Reitbauer says:

    Alik,

    would be great if you use dynaTrace AJAX Edition. If you find some interesting problems – which I guess you will ;-) – keep us informed. We are always looking for nice use cases to post.

    Good luck!

  4. While useful for *controlled* performance tests in your development environment, this is unfortunately fairly useless in the real world for two reasons:
    - it only suports Internet Explorer, which indeed has the largest market share, but Firefox also has a significant share and Safari/Chrome also have meaningful shares
    - it doesn’t monitor the real-world page loading performance perceived by your *actual* visitors around the world (i.e. with many different browser/network connection/OS/hardware/… combinations).

    For that, you need something like Episodes. But right now, there is no software available to analyze the results of Episodes. I’m going to write that software (i.e. something like Google Analytics, but for page loading performance instead of page loads, by feeding it data collected by Episodes) as my master thesis.

    See that last link for more details :)

  5. @Wim: thans for sharing that link to Steve’s Episodes approach – great theses.
    I thought I pointed out in the beginning of the blog that end user monitoring of your live system is critical – and thats what gets addressed with thinks like Google Analytics and other monitoring solutions.
    The focus of this blog was to show how measuring and monitoring can be done prior to putting your web site up on the web. The blog was really insipired by the meeting I had with this client where this approach is a huge time safer and is more accurate than their current method

  6. You’re absolutely right, I read over the sentence where you point out that this is for development and not for live system monitoring. Sorry about that.

    I think that it’s good to optimize heavily during development if you’ve got the resources to do so. But in general, it would be more economic to first pinpoint the problem, and then start using this advanced analysis tool.

    Also, Google Analytics is unable to measure page loading performance.

    Most importantly, I cannot fully agree with the implications of this sentence:

    In order to make sure that web pages are fast from the start

    Here, you assume or at least insinuate that if your local (development server) page loading performance is good, that it will also at least be fairly good in the real world. As you already know, that’s unfortunately not true.
    “All” that dynaTrace allows, is to make sure that the code and CSS/JS/image loading sequence is optimal in one specific browser: Internet Explorer. And don’t get me wrong, that already is a lot. More than we used to be able to do.
    But Firefox can behave very differently from Internet Explorer, and WebKit can behave again completely differently (check out Cuzillion for examples of that). So in fact an optimization for IE might be a performance regression for other browsers.
    And then there still is the network thing of course. The user may be loading the site via 56k, EDGE, 3G, ADSL …, from within 100 km radius of the server, or 10,000 km away. None of that is accounted for (at least not sufficiently) using this approach.

    And finally, the title of this article suggests that it is about overall “web site performance”, in an automated fashion:

    Ensuring Web Site Performance – Why, What and How to Measure Automated and Accurately

    But in fact, this article is only about web site performance in a controlled environment, limited to a single browser and automated within those constraints. For me (but it could of course be me), that title suggests that you’re able track web site performance of live web sites in an automated and accurate manner. And that’s also the reason I commented relatively fiercely :)

    I don’t want to diminish what you’ve done here. It’s nothing short of amazing and incredibly helpful. I just want to point out that you’re setting the wrong expectations. But again, that could just be me :)

  7. @Wim: you are right – there are different aspects of end-user performance. The “last mile” effect is a huge one where different factors add to the actual end-user performance.
    I will write more about actual end-user performance monitoring in the future but really wanted to focus on a problem that can already be solved early on – which is identifying application problems early on in a controlled environment. Most of the problems I’ve seen when working with our users are problems that could have been identified early on and that are the same in almost all browser. It is wrong usage of JavaScript frameworks – expensive CSS Selectors (I grant that here we see more problems on IE as in other browsers) – and unnecessary roundtrips to the server. Most of these problems can be identified in a controlled environment either using dynaTrace, YSlow, FireBug, DevTools for Chrome, …
    It was interesting for me to see which methods are used to measure and analyze performance in pre-prod. Using a Stop-Watch was something I didnt expect as there are lots of tools out there that can automate it and do it in a more accurate way. I really hope that based on this blog more people start thinking about what they can do early on to build faster web sites.
    But again – you are totally right with your comments that you wont be able to find all problems and espeically not those that have to do with the last-mile. Also thanks for letting me know that the title was misleading for you – will be more considerate in the future.
    Great to have an audience like you – all of us benefit from active community participation
    cheers :-)

  8. Yep, there’s few of us, but if we post constructive criticism on each other’s writings, then we can make progress :)
    But again, great post ;) :)

  9. Alois Reitbauer Alois Reitbauer says:

    Wim,

    great link and I like the approach. As Andi said, our context here was from a diagnostics point of view.
    Real end user monitoring is a challenge on it’s own. You definitely cannot use a plug-in based approach here.

    The episodes approach of Steve looks very interesting – well what else would one expect from him ;-) . I am just wondering how you get all the measurements into the code. Maybe I have misssed something here.

    Some events can for sure be measured by header injection. Everybody who used Google Analytics will be able to do that; which actually means nearly everybody.

    There was also a nice article on TheServerSide by the zK framework guys (http://www.theserverside.com/news/thread.tss?thread_id=59105). Especially for rich internet application this information is vital.

    Are you aware of any plans making an open source project out of Episodes?

  10. @Alois Reitbauer
    I am using dynatrace AJAX edition ;)
    Since i have discovered the tool it quickly got into my tools set of trade. Before that – I was in the dark ;)

  11. This is a wicked article. I will definitely be using some of this advice to speed up my websites. Thanks!

  12. @Alois: Episodes already *is* open source, but will be given a more prominent and permanent home in the next few months — at least that was Steve’s plan many weeks ago. There is a small bit of JS that has to load at the top of the page. Then, when each section in the page gets loaded, the time to get to that part of the page is recorded.
    I’d strongly recommend you read the white paper, it answers most questions: http://stevesouders.com/episodes/paper.php.

    This is not perfect because it implies a slight overhead for loading Episodes’ JS itself. Which sounds silly, but is totally worth it.
    For my bachelor thesis, I integrated Episodes with Drupal (among other things) and wrote a very, very basic analytics suite to analyze the results. It’s a proof-of-concept, it’s dog slow, extremely limited, but it shows the potential. If you’re interested in this, see the “Test case” section in my bachelor thesis text, which starts at page 74, and in particular the charts on page 80 and 83.

    The goal is of course to get the timing logic of Episodes implemented in browsers themselves, so that the overhead disappears and the timings become more accurate. There already is a W3C proposal for this by another Google employee, Zhiheng Wang. You can find it over here: http://dev.w3.org/2006/webapi/WebTiming/.

  13. Intresting post, Thanks

  14. “The tool also provides the ability to export captured data to XML or spreadsheet applications like Excel”
    Do you mean the dynatrace ajax edition? I’ve just downloaded the latest version, and when I export session data, I cannot choose xml nor excel.

  15. @mavadaes: yes-I am refering to the dynaTrace AJAX Edition. Every view is exportable via a copy/paste feature. Simple select one or multiple rows in any table/tree view and copy the content to the clipboard. The format is XML and can be pasted into Excel which understands the XML format and makes a spreadsheet out of it. Also – check out the dynaTrace Community Portal – it has an entry about that feature

  16. Hi

    In our client-side heavy app we can open some complex form into tabs.

    I want to profile the performance of getElementById(), which I believe, decreases as the dom size expands.

    Using dynatrace, if I log into our app and open 5 of those tabs, then closes, dynatrace will show me how getElementById has performed.

    I believe dynatrace considers it is slow enough to be shown in the hot spots panel.

    But if I profile only the opening of the last tab (should be the slowest), dynatrace will not give me information about getElementById.

    As a result I am left with data about a long scenari, but not the specific action I want to profile.

    Is there/Will there be a way to tell dynatrace the precise things I want to profile, before the run ? Or have you another workaround solution for me ?

    thanks

    Olivier

  17. @olivvv: You can drill to the HotSpot view for a specific URL. Open the Summary View – pick the URL for your last tab – right click and select “DrillDown – Hot Spots”.
    Also – if you have identified the PurePath (JavaScript execution that is slow on your last tab) the Contributors list on the lower left shows you which methods contribute the most to the performance of that PurePath

  18. Andreas thanks for your answer

    We have only one URL in our app, it is a single-load client-side application.

    When I profile the following actions:
    connect – login – tab1 – tab 2 – tab 3 – tab 4 – tab 5

    dynatrace gives me the performance of getElementById.

    When I profile only the opening of tab 5, I dont get the performance of getElementById.

    Dynatrace is good at telling me where to start optimizing the application, but it is not very helpful when i need more detailed data.

    Its good to know the hotspots, but I’d like also to be able to choose the spots to watch.

  19. @olivvv: While you are recording your steps you can use the “Marker” feature in the dynaTrace IE Toolbar. You can set a mark before every step naming it “connect”, “login”, …
    After you are done recording go to the Timeline. The markers will show up. Zoom in the are after your Tab5 Marker. Now do a “DrillDown – TimeFrame”. You will end up in the purepath view with all activities of that timeframe. Now you can click through the individual javascript PurePaths and use the contributor tab on the bottom left to see your hotspots for eacch individual purepath. You can also use the filter in the purepath tree (just start typing) to search for e.g.: getElementById

  20. Andreas thanks ! this is just like if had discovered right-click on my PC…

Comments

*


5 − one =