A Guide to Autoscaling

If your application doesn't have enough computing capacity to meet user demand, your business will grind to a halt. That's why so many application development teams adopt autoscaling. But autoscaling can be challenging when you're deploying onto multiple customer cloud environments.

In this article, I'll discuss what autoscaling is, the different types of autoscaling, and the challenges of autoscaling an application across multiple cloud providers.

What is autoscaling?

Any given software application requires computing power to run. Your application code, automation scripts, data storage, in-memory caches, networking, monitoring utilities, etc. all require servers and storage. Every computing resource has an upper limit to how much work it can perform.

Autoscaling is the process of increasing or decreasing available computing capacity without human intervention.

In pre-cloud days, companies created their own data centers or purchased space in someone else's. The amount of computing capacity available at any point was fixed. If you expected a huge spike in demand, you had to secure the physical hardware that the spike would require. When the spike ended, you turned the hardware off and it lay dormant until you next needed it.

The cloud opened up a new realm of possibilities. Cloud providers like AWS maintain excess computing and storage capacity that they lease to companies on demand.

Autoscaling takes advantage of this flexibility. With autoscaling, developers can define a set of signals they use to trigger a scaling event. This can be as simple as a time of day or as complex as a suite of metrics measuring current system activity. Using these signals, teams increase or decrease capacity as needed.

Horizontal vs. vertical scaling

How do you autoscale? There are two general approaches:

With vertical scaling, you increase the capacity of the servers on which your code is running. On AWS, for example, this might mean increasing a virtual machine you're using from an M6g.medium (1 vCPU, 4GiB memory) to an M6g.2xlarge (8 vCPU, 32GiB memory).

With horizontal scaling, you run your code on more than one machine. Servers run behind a common endpoint - such as an Application Load Balancer - that routes traffic based on a routing policy.

Both approaches have their pros and their limitations. I'll go into those details a little later on in this article.

Why would you want to autoscale?

Autoscaling isn't the only way to scale on the cloud. You can also scale manually. A Site Reliability engineer could use your cloud console or scripts to add or remove capacity as demand dictates. This approach works fine when your application's usage patterns are well-known.

So, why autoscale?

The primary reason is to handle the unexpected.

Let's say your application has a known average user rate - for example, 10,000 users/hour. One day, without warning, an influencer hypes your app. Suddenly, your usage spikes from 10,000 to 100,000 users an hour.

Without autoscaling, you'd likely miss this happening before it was too late. By the time you manually scaled your app, you'd have lost tens of thousands of potential customers.

With autoscaling, you could observe metrics such as CPU utilization, memory utilization, or average network connections across servers. When they exceeded a certain threshold, you could increase the number of servers available to handle incoming requests.

However, your application can also benefit from using autoscaling for known events. Let's say you know you'll have a traffic spike around 8am every day. Rather than rely on a human operator to scale out, you could schedule a script to handle scale up and scale down as a cron job.

Autoscaling challenges

There are a lot of reasons to add autoscaling to your app. For many applications, it's indispensable to maintaining application uptime and availability.

That doesn't mean autoscaling is without its challenges. Those challenges compound if you deploy your app to multiple customer clouds - especially if you're managing deployments across multiple cloud providers.

Defining the right metrics

One challenge is figuring out exactly when to scale. For some applications, that's straightforward. For others, like large enterprise apps, it can take some work to define exactly when a scaling event should occur.

Scaling everything

Scaling isn't just about scaling front-end servers. Databases, in-memory caches, middleware servers, back-end workers, and other services may all need to scale in concert to meet demand. Establishing the right parameters across all of these application tiers can be daunting.

Scaling in time

Another issue with horizontal scaling is the time it can take to spin up new computing capacity. Depending on how much configuration a new server or service requires, it can take anywhere from three to 20 minutes to register as healthy.

Coding for horizontal scaling

If you're porting a legacy application to the cloud, it may not be able to leverage horizontal scaling out of the box. Apps that depend on storing user data in local storage or memory, for example, won't function correctly when a user can be directed to one of several servers with every single request.

Scaling across cloud providers

Finally, one of the most significant challenges comes with supporting multiple customer deployments across multiple different clouds.

Most cloud vendors provide direct support for autoscaling in their platforms. But that becomes harder to utilize when you have to support customer deployments in AWS, Azure, GCP, and other cloud providers simultaneously.

Some teams work around this by writing pluggable providers that can configure autoscaling correctly for each platform. Others use cloud-agnostic solutions - like Codavel, which supports its multi-cloud Kubernetes deployments using Rancher.

Strategies for autoscaling

Choosing to should autoscale is one thing. Deciding how to autoscale is quite another. Here are some strategies to consider as you craft your autoscaling story.

Scheduled vs. dynamic scaling

Scheduled scaling is the most straightforward approach to scaling. If your traffic patterns are well-known, you can set scaling events to occur a half-hour or hour before you expect traffic to increase.

What if you don't know your traffic patterns yet? Some cloud providers, such as AWS, also offer so-called predictive scaling. This uses machine learning to gauge traffic patterns from your application metrics. It then schedules scaling activities in accordance with the results.

Dynamic scaling is trickier. With dynamic scaling, you scale up or down based on observed system behavior. Dynamic scaling is best used for public-facing apps whose usage patterns may fluctuate from day to day.

Determining which metrics to monitor for dynamic scaling requires consideration and testing. For Web applications, simple metrics such as CPU utilization and the number of active connections per server might suffice. For applications with backend workers or apps that process requests asynchronously from a queue, it may be better to scale on a metric such as queue depth.

The time required to start resources is critical in dynamic scaling. If you use dynamic autoscaling, ensure your server configuration procedures are as streamlined as possible.

For virtual machines, using pre-baked virtual machine images, such as custom Amazon Machine Images (AMIs) on AWS, can help speed up start times. You may also consider moving your workloads to containers, which have significantly faster startup times than VMs.

Horizontal vs. vertical

As I noted above, vertical scaling is the easiest form of scaling. With vertical scaling, you increase the size of the virtual server or the resources (virtual CPUs, memory) dedicated to a container.

Vertical scaling works best for legacy "monolithic" apps or small applications running on a single server or container. Compared to horizontal scaling, it's relatively easy to implement.

Vertical scaling has several drawbacks. There is a hard limit to virtual scaling, as you'll eventually find yourself running the largest instance or beefiest container supported by your cloud platform.

Additionally, running your app on a single device means you have a single point of failure. If that resource goes down, your application goes offline.

Horizontal scaling involves adding additional servers or containers that can process requests. It's recommended for medium- to large-scale applications that have either outgrown vertical scaling or that can't afford downtime.

Done properly, horizontal scaling can support a practically unlimited number of users. It's how social media services like Facebook and Twitter handle millions of requests daily. The downside is that it takes considerable effort and engineering investment to achieve.

Successful horizontal scaling depends on a couple of factors. First, your app should run correctly in a multi-server environment. This means implementing a stateless architecture and potentially even moving towards microservices.

Second, you have to centralize logging and monitoring. All of the nodes running your app (and all supporting services) should emit metrics and logs to a single dashboard where you can monitor service health.

Note that vertical scaling and horizontal scaling are by no means mutually exclusive. For example, you may decide to use horizontal scaling for your front-end application and vertical scaling for your database. Or you can use vertical scaling first and only switch to horizontal scaling when you've exceeded vertical scaling's limitations.

Kubernetes vs. other deployment methods

How you autoscale is determined largely by how you package and deploy your application.

One of the most popular packaging and runtime environments for applications is a Docker container. Docker containers contain a full virtualized runtime environment that includes all of the dependencies - files, configuration, libraries, executables - that your app requires to run.

If you're deploying onto multiple customer clouds, Docker is a great deployment mechanism. Using Docker insulates you from supporting custom packaging and deployment mechanisms on each cloud provider.

Sadly, while each cloud provider supports Docker containers, they all support their own non-compatible Docker hosting features. In some cases, they support several! On AWS alone, you can run Docker containers using Amazon Elastic Container Service ECS, Elastic Beanstalk, and even AWS Lambda.

One solution to this is to use Kubernetes. Originally created at Google, Kubernetes is a container orchestration engine that supports deployment, managing, and - yes! - scaling of containers.

You can run Kubernetes using a cloud provider's Kubernetes offering, such as AWS's Elastic Kubernetes Service (EKS). Alternatively, you can deploy Kubernetes yourself on the cloud providers of your choosing. Kubernetes has a built-in Cloud Controller Manager architectural layer that supports orchestration operations on all major cloud providers. If a cloud provider you're targeting isn't directly supported, you can author your own cloud controller.

How TinyStacks can help

Supporting multiple clouds is tough. TinyStacks makes it easy. Use TinyStacks to deploy your application into your customer's clouds across multiple cloud providers and manage everything through a single pane of glass. Request a demo today to see how we can help you!