I make apps for other people

Scalability Secrets: custom content that scales

Posted by Chris Jones
On July 15th, 2012 at 11:35

Permalink | Trackback | Links In |

Comments Off on Scalability Secrets: custom content that scales
Posted in General

Caveat: this isn’t about failover, security, or cloud computing.

Imagine you have 65 million registered users and you need to provide custom content for each (beyond “Hello, Bob”), say real time subscription content from tens of thousands of sources containing millions of posts.

Don’t try to show everything at once

You probably won’t be able to show everything to the user at once. Depending on your load and back-end systems, you may be lucky to simply let the registered user know that they have something to see.

Most of your users won’t be logged in

Most page views (at least to a homepage) won’t be on logged-in users: many will be first time users, or users who haven’t visited very often. You’ll need to hit your back-end servers with a specific ratio of total visits, something you can measure early, and use as a baseline for how many servers you need to scale.

Pick a good server architecture

Some servers and frameworks, like Seam, like to store things in memory, require conversations and sticky connections between the user and HTTP server. Some servers allow session-less connections and the applications are written to store state in another repository (such as a database or shared store like JNDI). Pick a framework or architecture that is lightweight, doesn’t place a lot of requirements on your application, and allows you to write decoupled services that are independently deployable and testable.


Memcached is an excellent way to store chunks of key-value data in memory on a per-service-instance basis. You’ll get your best results when you need to fetch common content for a relatively few number of keys, i.e., content that changes every few hours on a few hundred-thousand keys. Memcached loses efficiency when you have a lot of random hits across a very large keyspace (larger than can be completely cached) and when hits are spread between instances such that the behavior is essentially random.

Pre-warm your caches

For databases like Oracle, you can “warm” the cache by fetching rows from indexes or tables that will be needed in subsequent calls. For instance, imagine that you have a simple widget on your homepage that tells a logged-in user that he has new content available. The call to the back-end service should be fast, potentially cached in memory, but should have the side-effect of also performing a key query that warms the database cache and pre-fetches content for a memory cache.

The user may never visit the content page but the content will be available quickly if they do.

Use a reporting database

When trying to return content quickly, consider organizing your database in a reporting form. A well designed third normal form (3NF) relational database is going to include joins, subselects, or other multiple table operations that may slow down your results. To improve reads and reduce your index requirements, consider instead joining several tables in a materialized view, or creating a extract reporting database used for service reads and a master 3NF database for writes.

Summarize your content

When designing your database, create fields that will hold summarized versions of your content and only fetch the full content on demand. This allows you to reformat summary content so it fits well in a headline/synopsis view of the content without broken formatting tags, as well as a smaller chunk of data to fetch from the database and push to the user.

Don’t show too much on pages sensitive to latency

On pages like the homepage of a website, latency matters tremendously. Only show the minimum content that can be retrieved in a timely manner. Don’t pull the full content and slow the request for data, and don’t block the page render on any content service requests. It’s better to load the content asynchronously and after the rest of the page has rendered than it is to block the time to glass.

Likewise, cache static content on the browser when possible (such as CSS and image files), and place common image content into sprites to reduce the total page hits and individual HTTP requests made against the page.

Split your page render across services

Consider that what the user sees first is the “above the fold” or first full screen of information. This should be rendered very quickly and your pages should be designed such that:

  • the first screen content is delivered with the initial TCP response,
  • CSS and JavaScript are only loaded after the first screen of content is available to the browser,
  • and the rest of the content is delivered only after what’s needed for the first screen is complete.

Break your page rendering into two parts: a first screen rendering host that returns that content needed to reduce time to glass, and a “rest of the page” rendering host that can work on slower, more feature-filled content. Place both of these behind a web server that proxies the rendering requests, then assembles the page and delivers content to the browser. In the era of dynamic HTML, there’s no reason a basic framework can’t be passed to the browser and the content filled in as it’s available.

Cache your static pages

With the corollary that most of your users won’t be logged in, you can achieve significant server load savings by sniffing for login cookies or header information and returning static, pre-rendered content for non-logged in users. Additionally, by optimizing the content (see the next point) you can reduce the bandwidth on static content pages.

To strip or not to strip HTML?

In general, stripping unnecessary whitespace from HTML (such as indents or extra newlines) is a savings on bandwidth but will cost more in CPU time. If you’re pre-rendering static content pages, it’s a win. If you’re delivering dynamic content on each page, it’s probably not a win.

HTTP Compression and SPDY

You’re already using SSL, right? If so, SPDY should be a no-brainer. Likewise, HTTP Compression should be honored whenever possible.

Front your content servers

You must front your content servers with some kind of proxy. Optimize this proxy for HTTP Compression/SPDY and, if possible, to make multiple calls to a page rendering farm or services to retrieve page content. Don’t stick Apache, Tomcat, Tornado, or NodeJS with directly servicing user requests, instead proxy requests through a combination of load balancers, caches, and hardened HTTP servers (like nginx) to help achieve your scalability goals while reducing your outside-facing risk.

Comments are closed.