Lately I’ve read a lot of articles proclaiming that scalability is no longer an issue thanks primarily to AWS. As a devops engineer who lives and breathes this stuff, I’d like to point out that there are oodles of other technology advances that are more critical for scalability than simply being able to spin up virtual servers on demand.
The simplest possible example of why more servers != scalability is that of a MySQL query. If you run an unindexed query on a large table, you can add more slaves all day long but you still aren’t going to be able to service requests more quickly. Add an index, and suddenly you can service hundreds or thousands of similar queries with the same amount of resources as it took to run a single unindexed query.
I’d argue that the prime enablers of web scalability are:
- Cheap RAM. Storing a dataset in memory improves performance, which means more queries can be served in less time. Servers with 80GB of RAM (or more) are not only possible, but common.
- Non-relational databases. Everything gets measured these days, but relational dbs aren’t up to the challenge of storing large amounts of analytics data. Of course there have always been non-relational dbs, but the proliferation of Cassandra, Mongo, and other key/value stores enable us to store massive amounts of data concurrently with single digit millisecond response times.
- Better libraries & documentation. Resources like Cal Henderson’s Building Scalable Web Sites brought modern concepts like memcache and asynchronous queues to the masses. Now documentation is ubiquitous and all web languages offer multiple excellent libraries enabling scalable web application development.
You might argue that AWS offers many of the services I’ve described above. It’s true. But AWS was not the first to offer them, nor is AWS the only (or even cheapest) option today.