Zooming out and improving page load times on the web

This post serves as log for past and recent work where I was tasked to improve page load time for mobile and desktop devices for various e-commerce sites. I thought it would be good to write down the process I followed and my findings so I can refer back to it for my next project. Hopefully it has some use in the public domain.

I won’t be talking numbers directly or mention n transactions or n requests per second because those numbers don’t serve any purpose. What is important is that there’s a piece of software which services a certain number of customers and those customers did not get response times within the goals expected by the business. What I’m focusing on is real-world-customer-impacting  problems related to performance.

This article will leave you with more questions than answers but that’s my intention. I do give tips and general guidelines but each system is wired uniquely so it’s impossible to have concrete solutions, but it will provide a starting point and some best practices which always make sense, i.e. HTTP Caching headers.

Before you begin

Assuming a problem has been identified and known to be related to performance you can then begin measuring and get a more granular picture of what is going on. A good source of data which is consistent, doesn’t change and is as close to production as possible gives repeatable results and some confidence when numbers change between test runs. This could mean bloating your database with test data, loading images or throttling your network connection to get a realistic representation of production. In one instance, an application I worked on contained no sensitive information or user accounts in the database and was under a terabyte, so I took a complete backup.

Don’t get too hung up on test data though because performance can be measured as relative improvements. i.e. Improving page load times by 25%. If your site takes 10 seconds to load and you’d like to see it load in 5 seconds and you don’t have an environment identical to production then aim for a 50% improvement with the data you have.

There are tools out there which can generate mock data and give you something to work with. If you’re lucky enough to work on a system which doesn’t contain any personally identifiable information then you could save some time and take a copy of production.

Also be aware of changes in the code base affected by build time configuration which might include scripts or functionality that aren’t there in the development environment. For example, you might only include the Google Analytics tracking script in a production build of your application.

Measure at different layers in the stack

Spending hours optimising JavaScript query selectors and yielding a 10 millisecond gain in page load performance probably isn’t worth it, although it’s a best practice you should follow as you develop or else you’ll suffer death by 1000 paper cuts.

  • Measure front-end (browser) load times with a tool such as Chrome Developer tools.
  • Measure above-the-fold load times.
  • Set up load testing and get a baseline. My go-to tool is JMeter.
  • Measure application level performance with code profiling tools and take measurements under load. I usually focus on CPU sampling. This gives me clues as to where the application is waiting/processing the most.
  • Look at optimising database queries using a query plan analyser.

Set a baseline

Define the bare minimum set of requirements for customers whom you want to support. When put that way it makes me want to support all of them, but it’s not possible unless you have Google’s budget. Define the lowest common denominator of device configuration and base all work from those constraints.

Set your baseline to something agreed upon by the business and do your browser performance testing as experienced by a real user. A real device is great for testing once the optimizations have been completed, but I find it more productive to emulate devices first because of the shorter feedback loop.

You should:

  • Set a baseline for network conditions, i.e. Throttle to 3G. There are also more sophisticated tools such as clumsy which can simulate flaky network connections and dropped packets.
    • Tip: When choosing a network connection be careful not to base your choice around what Google Analytics is telling you about the majority of connection types of your users because users with poor page load time are less likely to come back to your site. It’s also not a good idea to assume everyone who visits a site connects with the latest and greatest devices.
  • Disable all caching in the browser when testing initial page load times. When caching is re-enabled it can only get better from there.
  • Use static data for testing so the before and after measurements will be accurate. If you’re using a development machine ensure the data doesn’t change or recreate it before each test run.

In Chrome developer tools (also supported in Firefox) it’s possible to throttle both network and CPU. When you enable these you’ll get a more realistic view of how customers experience the site.

Also take an incremental approach to improvement by using a tool such as Google Analytics or web server logs to determine which pages have the highest usage. Then start with one or two pages and slowly build up reusable bits of code and processes which you can use as time goes on.

Zooming Out

There are many moving parts in most applications and when they all dance together it presents a different picture to when they are tested in isolation. In order to get an overall picture I personally prefer to begin at the browser layer of the stack as I can get an indication of client side performance as the customer experiences it and also observe how long the network takes to respond through TTFB (Time To First Byte).

Often times I’ve seen the approach to performance to be narrowed in on a particular part of the system. I have on one occasion seen a few developers agreeing that if the “biggest” stored procedures execute quickly then the website will perform quickly.  I also see JMeter results used as the final decider on how performant a system will be and if it will handle the expected traffic. While JMeter is great at measuring the throughput of your application it won’t tell the whole story. If JMeter reports a throughput of 1000 completed page requests a second or 500 transactions per minute, some other questions worth asking are:

  • What’s the CPU and RAM utilization for the web server under such load?
  • What’s the CPU and RAM utilization for the database under such load?
  • Do the results change if resources (jpg, js, css etc…) are downloaded in full for each request?
    • Is that slowdown due to pressure on the web server or has the bandwidth been saturated?
    • Is a third party CDN part of the problem? For example, Amazon’s CDN is spread out over 12 regions where as CloudFlare has 114 edge nodes, meaning there’s a good chance there’s a node closer to our customers.
  • Is the server able to handle n requests with poor network connections?
    • Are those calls asynchronous in nature?

Here’s a diagram I made to help me visualise the application as part of a larger ecosystem.

When zooming out on performance and taking a look at each layer, these are the categories I focus on.

1. Web Application (server side)

Are there inefficient database queries, inefficient processing of query results, memory fragmentation, writes to the file system or perhaps a 3rd party API endpoint which contributes to the whole story? My recommendation here is to profile your code both as a single user and under some moderate load with JMeter pointing locally. What helps me the most is CPU sampling. This gives me a general idea where code is waiting or is slow in processing.

If you’re running on Mac or Linux I’m sure there’s an equivalent tool to Performance Counters, which comes with all server and some consumer versions of Windows. It lists hundreds instrumented values throughout the Windows stack and you can also create your own. Here’s a screenshot of a very small list of some of the things available under the .NET Memory category.

The categories I usually focus on a Web Service, ASP.NET, ASP.NET Applications,  .NET CLR LocksAndThreads and .NET CLR Memory. There are others for SQL and the Networking stack which might be useful. This is a great way to measure performance and performance related problems in production without installing any additional software and introducing risk. This is the first place I look to when quantifying.

Microsoft has a suite of command line profiling tools for the Windows stack and no doubt there will be tools for whatever stack you’re working with. Just be judicious when profiling production applications with command line tools as attaching to a running process can bring it down in some circumstances.

You should:

  • Measure CPU performance relative to the number of incoming requests. Remember, correlation, not causation.
  • Use Brotli compression if your webserver supports it
  • Use JMeter to gauge throughput
  • Use a code profiling tools to determine the cause of pauses or slowdowns.

I personally don’t have any experience with HTTP2 yet as IIS does not currently support it.

2. Database

Database queries are one source of pain, but another could simply be the number of queries it has to deal with. Take my earlier example where JMeter is reporting a throughput of 1000 page requests a second, one important thing to look at is how the database is performing. If it’s close to 90% CPU utilisation, scaling will become a problem for anything beyond that. What to do? In these scenarios you’ll want to find out the queries which are executed the most and look for problems with indexes. I also use caching strategies at the application layer, looking for data which seldom changes (think Zip codes or slow moving product listings).

Sometimes I see Redis used as a bandaid solution to a much larger problem, but it all depends on your circumstances. For example, if you’re using a 3rd party (closed source) CMS where the underlying data store is out of your control, and making changes might break the upgrade path at a later date, then you might have no other choice. As always, it depends.

You should:

  • Find the most frequently used database queries
  • Use a query plan analyzer to look for optimisations
  • Use the index luke!
  • Look at in memory caching strategies
  • Use Redis or similar solutions

3. Network

Some of the things to look at here are:

  • Outgoing bandwidth from your servers, and in particular, saturation.
  • Third party services you integrate with, i.e. server to server communication and payment gateways
  • Response time or TTFB (Time To First Byte)
  • Response times from customers browser (write instrumentation code to measure response times to different services, i.e. CDN networks)

In a performance audit, this is one area I spend the least time in, not because it’s not important, but it’s rarely the bottleneck in a corporate or cloud based network.

4. Web Browser

Here’s where things get interesting, mainly due to the fact that it is its own executing platform executing in variety of hosts (desktop, mobile, tablet, TV) and in different conditions, screen sizes, networking conditions such as latency and bandwidth, downloading data from many sources, rendering time, script blocking and script execution. There’s many things to look at, so let’s begin.

In order to get the most gains out of the time given to you by your boss, you’ll want to look at low hanging fruit first. Firefox has a fantastic feature which lets you visualise the resources on a page broken down in a pie chart.

Finding this chart is a little tricky, so first go to the networking tab in Firefox developer tools. Click on the icon highlighted in red. As an example I’ve opened up cnn.com.

You’ll then get a breakdown of the page weight with and without a primed cache.

4.1 Caching

Some things which are immediately obvious is that they aren’t utilizing HTTP caching at all! Caching headers/policies are important not only for returning visitors but for proxies which cache requests (for free) as they pass through on their way to the browser.

You should add Etag, Expires,Cache-Control headers where appropriate.

4.2 JavaScript

The amount of JavaScript being downloaded is insane. I ‘m sure I can find sites with larger amounts but for what this site actually does which is deliver content in textual form and serve video, I find this to be very poor practice. With that amount of JavaScript being downloaded it will significantly delay page load times due to parsing and execution (which blocks rendering).

On a positive note, most of these resources are coming from a CDN which allows the browser to download more content in parallel. There is a limit to the number of connections per host. On average it’s between 6 and 8.

You should:

  • Break up JavaScript in per page bundles
  • Use the async and defer attributes where it makes sense. Be careful with defer and make sure you test.
  • Load resources (Js, Css) from a CDN (even you’re own, not just jQuery and bootstrap)
  • Load resources when needed or after a certain period of time. i.e. Don’t load the dhtmlxgrid component until you actually need to display a grid. (see screenshot above)
  • See what you can remove. i.e.e Most modern browsers have the selector type behavior of jQuery (document.querySelector) and many more new API’s so you might be able to remove it all together.

4.3 Images

Most of the time, for the sites I work on, images are the main culprit so I have the mention them even though CNN doesn’t have this problem. Image optimization is the easiest and most beneficial resource to work with as it offers the biggest savings in terms of bandwidth. In Chrome developer tools, filter by images and sort descending on image size and you’ll know where to begin. Chrome will even break down numbers on the bottom, representative of the filter.

You should:

  • Lazy load images until they are in view. The library I’ve settled on for this is lazysizes.
  • Reduce the size of your images. For JPG’s, I’ve found the sweet spot to be 85% quality and can make a huge difference. For PNG you could use a service such as TinyPNG.
  • Remove all image metadata. It’s small but it’s easy to do.
  • Serve images from a CDN.
  • Serve WebP for browsers which support it.

I find the best approach is for the web server to handle these tasks for you. In ASP.NET there’s a library called imageprocessor which processes images on the fly. It will handle image reduction, PNG optimisation (with a plugin) and can even cache resources on a CDN from your local copies. There’s a small performance hit when the site first starts up, but once the cache is primed it will be worth it. Provided you don’t remove the cache, the next time the web application starts up, it won’t need to build the cache again. I’m sure there’s a library for whatever platform you work on.

4.4 Limit 3rd Party Trackers

This one might be a hard sell, but see how many analytics and tracking scripts you can remove. They not only affect page load times (especially on mobile) but have a habit of downloading more scripts. One particular provider I dealt with recently (Gigya) required the developer to place “a small tag” in the head of the page. That small tag then proceeded to download over 28 scripts and delayed the page by several seconds.

Measure script load times with all of the other requests, don’t just copy the script tag and measure it on it’s own as you’ll get very different results. The can be due to resource contention on a  mobile device (not enough RAM), too many requests in-flight or other factors.

You should:

  • Delay loading of scripts by placing them at the end of the <body> tag, they don’t always need to be in the <head> tag even though they’ll tell you otherwise.
  • Talk to the business to see what they’re using. Perhaps they forgot about script X and don’t need it anymore.
  • Measure load times of all third party analytics and tracking scripts combined and show the impact it has on page load times.

4.4 Delay loading of features

Some features may not be needed until some time after the page has loaded. A first time visitor, for example, probably isn’t looking to create an account anytime soon so you can probably delay signup.js or in my example further up, delay Gigya (a third party identity management product) until necessary. This will require a lot more effort as scripts will need to be broken up by feature rather than by page. For an existing application it’s significantly more work.

Final Words

There are an endless number of posts online about improving page load times, but I wanted this post to emphasis the big picture as I find most posts are very browser-centric.

Application performance and optimization matters, not only to give your customers a great experience but also for your bottom line. Fast page loads has been proven to affect conversion rates and overall usability of your product. People’s time is important so treat them respect and value the little time they give you.

One last thing, don’t get stuck in the trap of believing the cloud will save you from scalability issues. While that’s somewhat true, you’ll feel the inefficiency when the invoice arrives.

Resources

A list of resources mentioned throughout the post.

Measuring and Quantifying

Image / Server Side Processing

Test Data