Think a little about the readers of your web site

This is a translation of a post by Ricardo Galli about some of the lessons he has learned on Menéame, a social news website in Spanish similar to Digg. I wanted to share some of the concepts with my co-workers, but I thought that it could be interesting to translate the complete work and share it with the whole world ;-) Any English errors are my own. I will also like to thank David Brodigan for help me reviewing the English version.

Bored of having to wait more than 5 seconds to display a blog’s page? Annoyed with those sites with dozens of widgets, gadgets, AJAX effects and mashups that take hours to load? Troubled because you have developed a very efficient program for the last hot framework and “it’s slow”? Me too, and that worries me a lot, These sites are incredibly crap pieces of work that don’t take into account the basics about usability and human interface: Response time perceived by the user.

You can criticize everything else about Menéame, except its speed or that we have not taken into account this very important aspect, that’s why I’m sharing the few rules we have been following very strictly. We already knew some, but we have also learnt many more during these past five years of development .

There are a lot of parameters to take into account to develop “fast” websites. Not only the server speeds, or the time it takes the server to generate dynamic HTML, there are other parameters that directly affect the browser and user’s perception.

In July 2001 I wrote an article at Bulma [in Spanish] where I explained, according to measurements made during the development of the first sites of Diari de Balears and Última Hora (during the years 1997-1998), the fundamental technical parameters to measure and take into account: response time, return time, download time and “display time”. That last parameter, display, is the one that has the most impact for the user. The user expects a quick response, and that’s mostly perceived as the time that takes for the page to start to display on the browser.

In that article I commented, giving examples, of how to achieve a shorter display time that the total time needed to render and download the whole HTML using optimized generation. So, a rendered page without taking those parameters in mind, takes about six seconds to display.

Tdis = Tdownload = 0.5 sec + 40 KB/7.5KBsec = 0.5 sec + 5.33 sec =~ 6 seconds

On the other hand, the same page, but properly formated, can reduce the display time to just one second.

Tdis = 0.5 sec + 4 KB/7.5KBsec = 0.5 sec + 0.53 sec =1 second

Those benchmarks were made during the reign of HTML tables, when CSS and JavaScript where not even used yet. Now the situation has gotten worse, even with the spectacular increase in bandwidth (the best available in 2001 was RDSI at 64 kbps and a few 128 or 256 kbps connections)*, better browsers and increased processor power. Today, there are a lot of popular sites that take more than six seconds to display.

* [of course, he is talking about the situation in Spain at the time, in other countries maybe the connection was better at the time]

The basic problems are still the same, just increased in complexity by the presence of CSS, JavaScript, advertisement, libraries, effects, and so on. It could be difficult to make a recipe for all situations, but here are some rules we strictly follow on Menéame:

1. Minimize the inclusion of CSS and JavaScript

CSS and the majority of JavaScript files are included at the top of HTML and they are blocking, meaning they make the browser to stop the rendering process until the downloading is completed. Including a lot of CSS and JavaScript files on a website should be carefully planned. Most of the time, it’s just annoying the users. A good rule of thumb is to keep the number of those files down to a minimum, don’t include unused libraries and reduce the number of images inside CSS, as they generate additional connections.

It’s very common for blogs and websites to include every kind of external JavaScript (widgets) without taking into account the penalty on display time. Those scripts usually block the render, especially if they weren’t carefully developed to avoid it. A good script usually is just a few lines of code that defines a fixed-sized iframe and then afterwards loads the rest of the code (using the proper requests). An example of good code in this respect are the Google AdSense scripts.

So, if external JavaScript code has to be included, a closer look at how it works should be taken, and those that block the rendering while making their queries to external servers should be discarded. If there is no choice but to use those kind of slow, blocking scripts, some strategies can be used to circumvent the problem. It’s what Menéame does on the two advertising blocks it has.

That advertising is generated from JavaScript libraries from Social Media S.L. server. That server first verifies if there are any specific advertising for the site from their own and third party databases. If not, it redirects to load AdSense. This process is slow and the server API is not optimized. The solution that Menéame uses can be adapted for use on any site with similar problems.

The JavaScript is not loaded directly. Instead, a fixed-size iframe is defined which loads an independent HTML. That code is the one loading the advertising scripts. The load of the iframe content does not block the rendering of the page and has a lower priority. That is the reason that, when arriving at Menéame, advertising is the last to be displayed, after rendering of the complete page.

So, if dealing with those slow widgets is required, the users will appreciate if they can be loading, avoiding those annoying waiting times. But even more important, when developing widgets or plugins the same techniques can be applied. Here you have an example of how we do it for the Menéame “votes counter” check_url.js or this similar but more elaborate example.

Note: You can also use the defer attribute from the script tag, but be careful, we had lots of problems when trying to use it.

2. Delay the load of JavaScript files when it’s not needed to display the page

As mentioned previously, the number of script files included at the top of the page should be reduced to a minimum. When additional files are loaded, the best approach is to include them at the end of HTML, just before the </body> tag.

That sometimes is not easy, as the code can be needed to render or display content on the same page. That can be solved, like in the Menéame case of news with geolocation, when the Google Maps map is displayed on the side bar. The main idea is to delay as much as possible the load of the additional JavaScript.

The code can be seen at the end of the html, just before the </body> tag (the first script includes Google API, the second Menéame’s ownfunctions):

<script src="<a href="><!--mce:0--></script>

The moment the browser interprets the CSS of those three divs, it can calculate the needed sizes to quickly render: the width of the visible part, the height of the top header, the height of the left menu and the width of the right column (as well as the width of the central main column) The part that takes most of the CPU time in Menéame page is the link or comments requests. To reduce the display time even more, the first block sent to the browser is the content of the right column (the #sidebar). As the width is already specified, it can be rendered even when the browser hasn’t received the HTML of the main column. That’s why on slow connections (or slow computers), the first element to be displayed is the right column.

4. Use different domains for the static content

Browsers (at least most of them) parallelise downloads from different sites, so it is better to download static files from a different domain.


...

...
<script src="<a href="><!--mce:1--></script>
...

...
<img src="<a href=" alt="" />http://aws.mnmstatic.net/img/common/search-left-04.png" alt="" width="6" height="22" />
...
<img src="<a href=" alt="" />http://aws.mnmstatic.net/cache/00/01/1-1280021590-20.jpg" alt="gallir" width="20" height="20" />
Line 3 needs some comment. There are some libraries, like jQuery that are used in lots of places. Even when it’s the same code, each site includes its own copy, which means the browser will download the same file over and over.  To avoid this situation, Google created the Google Apis servers, when a copy of those libraries is mantained to be used on any place. If every important site used them, the users will save a lot of time (also, Google servers are distributed and optimized, so the download is quite fast) .

5. Use different domains, not subdomains, for static content

This point could have been included as part of the previous one, but it has been stated as a different one,  as it is a nice trick for complex sites (with use of cookies) and a lot of static files or images.  As seen in the previous point, a subdomain meneame.net (i.e. static.meneame.net) is not used to serve static content. Instead, a completely different one is used instead. The objective is to avoid the generation of aditional traffic due the use of cookies. Each defined cookie implies sending the cookie each connection for that domain. This happens even when downloading small images of just a few bytes. Also, upload bandwidth is usually smaller than download bandwith. This situation is even worse with cookies defined by statistical tools like Google Analytics, which tend to be big. The way of avoiding all that useless traffic is to use a completely different domain. According to some benchmarks after implementing this solution on Menéame, about 14 KB was saved on each request for a browser with an empty cache.

6. Define timeout for the static content

To avoid the browser connecting each time to verify that a static file was modified, define a timeout. That way the browser won’t come back to check the file until that time has expired. In Menéame the timeout is set to 10 days, in nginx:

location ~* \.(gif|jpg|png|js|css|html|ico)$ {
  expires   10d;
  ...
}
For example, for an image it will generate the following HTTP headers:
Cache-Control: max-age=864000
Expires: Mon, 06 Sep 2010 16:12:01 GMT

7. Compress the text content

If a site is reasonably well programmed, it will spend most of its time sending HTML over the net , so the browsers can accept a compressed version. With current processors, it’s best to compress the data before sending it. Web servers can do it on-the-fly if the browser is able to. For example, nginx configuration for Menéame to compress in gzip:

gzip  on;
gzip_comp_level 4;
gzip_proxied any;
gzip_min_length 512;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";
gzip_http_version 1.0;
gzip_vary on;
gzip_types text/css text/xml application/x-javascript application/atom+xml text/plain application/json application/xml+rss text/javascript;

There are more rules and tricks to minimize times, like set all the CSS images in just one file and treat them as sprites [*], but using just these seven rules, the speed improvement in any site will be noticeable.

[*] I don’t particularly like this option for sites where the design is dynamic like Menéame. Each time an icon is changed or one icon is added a new image should be generated, which has to be coordinate-compatible with the previous one, or it has to be redefined on the CSS. It’s a lot of “admin” work and quite error prone. But I think it’s a good strategy for plugins of widgets that include icons and are going to be used in lots of places (so stability is greater). I  hold the same opinion about minimizing the JavaScript code.

You don’t need faster servers! You must intercalate

This section is geekier and is the one that motivates me to write the whole article. But for different reasons it goes last and is smaller that I would like. The trend in web programming is to use more and more complex and sophisticated templates. Some typical examples are to generate all the queries to the DB and then generate the HTML results inside a FOR loop inside a template. More sophisticated examples are to include a template inside other templates or use inheritance (blocks). Something similar happens in simpler cases, like using the PHP output buffer. Informatics Mythology says that the better way to use TCP/IP is to lessen the number of sent packages. Probably it is better for simple sites which can be generated very fast. But the truth is that current sites generate pages of several tenths of a KB (the main HTML in Menéame has an average of 50-60KB and some news with lots of comments can easily reach 200KB) Some weeks ago we released the code of Haanga, the Menéame template system. Even though it allows inheritance we don’t use that feature. Nor the PHP output buffer. We also don’t use big FOR loops, but the content is progressively generated for each news,  comment or note. This progressive generation of HTML is used for all the parts of the page, headerfooter, etc. That way, the response and return time is lower.

A web application has several different parts: database queries, logic, HTML generation, network transmission. To simplify, they can be named as:

  • Q: Database queries and logic
  • H: HTML generation
  • N: Network transmission
  • A: Additional files (CSS, JavaScript)

So, a typical pattern when PHP buffer is used, or when an HTML template is sent at the end of the logic (in blue and as reference, the moment when the browser begins to display content to the user)

QQQQQQQQQQHHHHHNNAAANNNNNNNNNNNNNNNNNN

In this example:

  • Ten time units (let’s call them “ticks”) for the logic and database queries.
  • Five ticks for HTML template rendering
  • Twenty ticks for compression (if enabled) and net download
  • Three ticks to download additional files

Of course, that’s only an approximation, if more complex templates are used, the HTML render could take more time than the logic and queries, or the opposite if the HTML is simple but queries are complex. In the example, the total time is 38 ticks, and the user can start to see something after 20 ticks. If the web is very slow, most programmers will decide that the best approach is to upgrade the database server or web server to a faster one, lowering the CPU load, and reducing query and HTML generation time. But that is not easy, specially because the access to the DB server involves (at least on big websites) network connections. Even if the processor speed is increased, the latency of the network is the same.

Even so, let’s assume that is possible to change the servers and network for faster ones and reduce to the half the time[*] for querying and rendering the HTML

QQQQQHHHHHNNAAANNNNNNNNNNNNNNNNNN

After that investment to achieve such an inprovement, plus all the extra work in the migration to the new infrastructure, the time will be reduced to 31 ticks (improvement of 18%) and the display time to 13 ticks(improvement of 35%) [*] Usually web programming is single threaded, so getting more CPUs or cores doesn’t speed up the process. The core clock can also be increased,  but doubling the MHz is not going to double the speed (not at all) and the monetary cost can be very high. But the programmer is forgetting something important that he learned over his studies and that is fundamental in IT, operating systems and multiprogramming in general: intercalation.

So, taking that into account, the strategy is to get to the HTML generation as fast as possible to make it progressive, so the compression and transmission can start while the rest of queries are made. So, taking that strategy on the original schema, with the original servers:

NNQQQQQQQQQQHHHHH
  AAANNNNNNNNNNNNNNNNNN
So, immediately the static (or the one needing very few queries) content, is sent so the browser can start to display it. With those numbers, the total time is 23 ticks (an improvement of 40%) and 5 ticks before starting the display (improvement of 75%). The results are much better without having to change any equipment, just putting up front some template evaluation. Of course, this spectacular improvement depends highly on the particular application. Menéame is an extreme case with regard to its times, logic and DB queries are very optimized and consume a small amount of relative time. Next are the resulting times of 100 measurements about the production servers on a peak hour (friday between 12:00 and 13:00)
  1. Queries times, plus logic (Q) average 0.03348 sec (standard deviation 0.02155)
  2. Queries + logic (Q) + HTML rendering (H): 0.04663 sec (standard deviation  0.02903)
  3. From #1 and #2 we get that average H is 0.01315 sec

As it would be a lot of work to redo programs and templates, I made some benchmark enabling and disabling buffering and compression. The next table shows the total average time (in seconds, taking into account the time to create a new TCP/IP connection) that takes to load portrait HTML (about 14KB compressed, 57KB uncompressed):

Uncompressed Compressed
Unbuffered Buffered Unbuffered Buffered
ONO 12 Mbps 0.52 0.61 0.32 0.34
3G Vodafone 1.09 1.12 0.86 0.89
Without buffering it is a little faster, but the difference is very small. Compression is more noticeable, it can reduce the time up to 38%

Conclusions

Output buffer shouldn’t be used unless you’re not absolutely sure about what you’re doing and/or have measured the impact. In general it will be better without and will consume less server resources (basically RAM memory). If an application with templates is complex and the time of processing and querying is relatively big, generating a big, final template should be avoided. That will save RAM and CPU time.

If serializing the generation of each object can’t be achieved, at least an independent template with the HTML header can be generated and sent as fast as possible, so the browser can download in parallel the CSS and JavaScript files.

Final comment about mobile devices and iPads

A lot of access comes from cell phones and iPads. The users should be happy to access with their gadgets when they browse through your site, that’s why they spend a lot of money on them! There are some easy tricks, like not showing secondary content, not loading heavy API and use the @media selector on CSS to adjust sizes, margins and not display complex sections.

Menéame has had its own optimized mobile site for a long time. Very few users knew this, so an automatic redirection was implemented to the mobile version if it detected a mobile browser (we had over 10,000 daily visits). A lot of people complained about it, so the access was restricted to direct access from external sites, or if a short link like http://m.menea.me/m5za is used (for example, it’s used on automated tweets)

To make the browsing on the main site more lightweight for those users, some differences with the non-mobile version have been implemented

  • The right side bar and the top advertising are not generated.
  • Font size and some margins are changed, and some content is hidden (display: none) using the @media selector on CSS files:
/* Definitions for mobile, iPad and TVs */
@media print,tv,handheld,all and (max-device-width: 780px) {
body {
font-size: medium;
}

.sneaker {
font-size: small;
}

#wrap {
min-width: 320px;
}

#sidebar {
display:none;
}

#singlewrap, #newswrap {
margin: 5px 5px 0px 5px;
}

#footwrap {
margin: 5em 5px 0 5px;
clear: both;
}

.banner-top {
width: 470px;
}
}
  • The maps and API from Google Maps are not loaded.
  • The megabanner is replaced for a smaller AdSense ad.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s