A Guide to Heroku Autoscaling

Many users of cloud services like AWS often complain that the abundance and complexity of features creates a steep learning curve for cloud application deployment. Heroku is a Platform as a Service (PaaS) provider that aims to address that problem.

In contrast to regular cloud providers, Heroku offers a more limited set of easier-to-use features. The goal is to enable teams to deploy their applications with a smaller learning curve.

Many teams use Heroku when they're just starting out in the cloud. But what happens when your app grows in usage and complexity? Does Heroku scale with your business?

In this article, I'll look at the autoscaling features of Heroku that enable your application to grow in response to demand. I'll talk about what works with Heroku autoscaling - as well as some of its inherent limitations.

What is autoscaling?

Serving Web application traffic requires compute capacity, usually in the form of Web servers. A single server can only handle so many simultaneous users. To serve more users - i.e., to scale - you typically need to run fleets of dozens, hundreds, or even thousands of servers.

In a traditional data center setup, you're limited by how much you can scale by the number of physical servers you own. You need as much physical compute as you predict you'll need for your busiest periods of traffic.

In this model, scaling is manual. Companies need to buy additional server rack capacity and install and configure it. If that capacity isn't ready, then users who attempt to access a maxed-out application will receive timeouts and errors.

But the cloud is different. Cloud providers such as AWS and Google maintain large data centers from which customers can rent compute capacity on demand. Even better, application developers can request additional capacity programmatically. Gone are the days of buying a bunch of computers that sit dormant in a data center 10 months out of the year. With the cloud, companies can request exactly what they need, when they need it.

Autoscaling is the process of scaling an application's compute resources to meet consumer demand. With autoscaling, application developers can define a set of conditions - or scaling events - that will trigger automatic calls to their cloud provider to add more compute. Scaling events can include such criteria as average CPU or memory capacity across a server fleet.

There are two basic ways developers can autoscale an application:

With vertical scaling, developers move their applications onto new servers with more processors, more RAM, or greater I/O throughput. With virtual scaling, the total size of the server fleet remains the same but the compute capacity of individual nodes in the fleet increases.
By contrast, with horizontal scaling, developers increase the size of the fleet by adding more servers.

Vertical scaling is typically easier to program. However, most projects find they ultimately need the additional scale and redundancy that horizontal scaling provides. But no matter which scaling strategy you employ, autoscaling is a critical feature that enables teams to avoid downtime that could translate into a loss of business.

Heroku autoscaling logic

As I discussed earlier, Heroku operates on a slightly more abstracted model than a cloud provider like AWS. On Heroku, applications are run on a construct called a dyno.

Dynos are an abstraction above more primitive cloud concepts such as virtual machines, private networks, and cloud storage. A dyno provides all of the resources required to run an application. On the back end, a dyno is essentially a Linux container. (If you're unfamiliar with containers, check out our overview here.) Dynos run in isolated environments on shared computing resources.

Anatomy of a dyno

Dynos can exist in one of three configurations: web (web application), worker (back-end processing logic), and one-off (administrative task processing). You can build complex app architectures on Heroku by combining multiple dynos with different configurations into a dyno formation.

Each dyno in a formation has a dyno type that dictates how much memory, CPU power etc. it receives. Heroku supports a free tier that provides 512MB of memory and a slice of CPU processing time. At the other end of the performance spectrum are performance dynos, which provide 100% CPU share. Heroku also supports private spaces - dynos that run on dedicated virtual machines.

Manual scaling and autoscaling on Heroku

Scaling an application on Heroku means scaling your dyno formation. In accordance with our vertical/horizontal discussion earlier, there are two ways to scale your dyno:

Increase the size of the dyno (vertical scaling).
Increase the number of dynos (horizontal scaling). Each dyno in a dyno formation can be scaled individually to your desired settings.

App developers can perform manual scaling on Heroku no matter what. Certain dyno types, however, can enable autoscaling. You must be running a Performance tier dyno or a private space dyno to leverage the feature. Once enabled, you can specify a range representing the minimum and maximum number of dynos you desire. Heroku will show you the potential cost for both the min and max of this range.

With Heroku autoscaling, you can also specify a desired p95 response time. This defines the average number of microseconds within which you expect app users to receive a response 95% of the time.

Why do devs autoscale in Heroku?

Heroku aims to simplify cloud deployments. But no matter how you deploy your application, teams using Heroku face the same challenges as teams using AWS, Azure, or GCP. By using autoscaling, devs can avoid downtime caused by sudden traffic spikes.

Downtime can kill business revenue. One firm estimates that the cost of downtime for the top five e-commerce sites in the United States is around $3.5 million per hour.

Your product may not pull that type of revenue. But that doesn't mean downtime isn't costly. Every minute your Web application is down is a minute of lost sales.

Yes, developers on Heroku (and other platforms) can scale manually. But manual action is often too little, too late. No one can predict when a viral tweet or TikTok will send hordes of new visitors to their virtual front door.

Even if your application isn't totally nonresponsive, a slowdown can still cost your company money. Even slow-loading Web sites drive users away. By some estimates, Web sites lose 1% of their visitors for every 100ms delay in responsiveness.

By enabling autoscaling, developers using Heroku gain a certain peace of mind. They know that their application will automatically scale out as soon as Heroku detects higher than usual traffic. Such responsiveness can save their companies tens of thousands - or even millions - in revenue.

Use cases for autoscaling in Heroku

So when would you autoscale in Heroku?

You'll likely want to autoscale in Heroku if you're developing a critical line of business application that serves end users. This means that:

Your app must deliver a response to users quickly; and
Your app must always be available to new and existing users.

Heroku users who need such high levels of responsiveness are likely already on Heroku's Performance plan or using private spaces - a necessary prerequisite for using autoscaling on Heroku.

The limitations of autoscaling on Heroku

Heroku autoscaling is a saving grace for Heroku developers who need to react fast to traffic spikes. However, it does have its limitations.

First off, dynos themselves on Heroku have built-in vertical scaling limitations. For example, Heroku's largest Dyno, Performance, caps out its memory at 14GB RAM. Compare that to Amazon Web Services, where even the low-cost T2 Ec2 virtual machine instance can scale up to 32GB. (AWS's C5 instances can scale up to 192GB!)

Second, there are also limitations on horizontal scalability. No dyno type can exceed 100 dynos. And performance dynos are limited to 10 max unless you contact support.

These limitations may be fine for small- to medium-sized applications. But if your application starts to scale to hundreds of thousands or millions of users a day, they're eventually gonna bite you.

There's also the cost factor associated with Heroku autoscaling. As a PaaS, Heroku is intrinsically more expensive than running a more foundational cloud service like AWS. Many customers report they can get 4x the performance of Heroku's dynos on AWS for about half the cost. As you scale on Heroku, this price differential becomes even more obscene.

How TinyStacks can help

Many companies start with Heroku because it's simple. Heroku enables them to launch applications quickly. Contrast this with using AWS, which requires learning a dozen or more services in order to launch an app in production.

At TinyStacks, we thought there had to be a better way. So we built a solution that combines the ease of use of Heroku with the cost and performance benefits of AWS.

Using TinyStacks, you can launch your application in a matter of minutes - no previous AWS experience required. We take care of launching all of the back-end infrastructure you need. What's more, we stand up this infrastructure on an AWS account that you own! This means that the costs are fully transparent to you. It also gives you the power to scale from thousands to millions of users within a matter of minutes.

Ask us for a demo!

If you're on Heroku and are looking for room to grow, or have considered it for its ease of use, ask us how TinyStacks can help. Contact us for a demo today!