Faster and Cheaper: How Hudl Saved 50% and Doubled Performance

We took time to opti­mize our EC2 instance types. By find­ing the max­i­mum load a serv­er could han­dle we were able to run a quar­ter as many app servers. Our hourly spend dropped by 50%. Despite the huge cost sav­ings, we also saw a 2x improve­ment in response times! This came about by mov­ing to a new­er instance family.

Faster and Cheaper: How Hudl Saved 50% and Doubled Performance

We took time to opti­mize our EC2 instance types. By find­ing the max­i­mum load a serv­er could han­dle we were able to run a quar­ter as many app servers. Our hourly spend dropped by 50%. Despite the huge cost sav­ings, we also saw a 2x improve­ment in response times! This came about by mov­ing to a new­er instance family.

Hudl has been run­ning on Amazon Web Services (AWS) for years and we rarely take the oppor­tu­ni­ty to opti­mize our instance types. Recently we began mov­ing to Virtual Private Cloud (VPC), which caused us to re-exam­ine each instance type we use. Choosing the right instance type from the start is chal­leng­ing. It’s tough to choose the opti­mal instance type until you have cus­tomers and estab­lished traf­fic pat­terns. AWS inno­vates rapid­ly, so today there are a lot of choic­es. Balancing com­pute and mem­o­ry, dura­bil­i­ty and per­for­mance of stor­age, and the right type of net­work­ing attrib­ut­es is impor­tant. Choose wrong­ly, and you’ve wast­ed mon­ey, incurred down­time, and/​or hurt per­for­mance. Spend too much time opti­miz­ing and you may have trad­ed time bet­ter spent on prod­uct improve­ments and rev­enue in exchange for a rel­a­tive­ly small amount of money.

In this blog post I describe how we shaved 50% off our AWS spend for web servers and dou­bled per­for­mance. We also learned about how dif­fer­ent instance types per­form rel­a­tive to each other.

Goals

  1. We want­ed to under­stand how much traf­fic one serv­er could handle.
  2. Once we under­stood max load, we could make a bet­ter apples-to-apples com­par­i­son of dif­fer­ent EC2 instance types and fig­ure out the opti­mal one for our usage.
  3. Thinking about auto-scal­ing, we want­ed to under­stand an appro­pri­ate met­ric (CPU, requests per sec­ond, some­thing else?) to trig­ger scal­ing events.

Approach

A com­mon chal­lenge for con­duct­ing load tests is com­ing up with accu­rate test data. It can be time-con­sum­ing to gen­er­ate the test data and you still only get an approx­i­ma­tion of real­i­ty. You don’t want to opti­mize for a traf­fic pat­tern that won’t actu­al­ly occur in pro­duc­tion. Rather than sim­u­late pro­duc­tion traf­fic, our tool­ing allowed us to safe­ly use pro­duc­tion traf­fic for these tests. In addi­tion to more accu­rate results, we saved a lot of time. This entire effort was done with just one day of work.

To use pro­duc­tion traf­fic for this test, we relied on a char­ac­ter­is­tic of the Elastic Load Balancer (ELB) ser­vice and some of our own inter­nal tool­ing. All of our web traf­fic ini­tial­ly flows into our ELB. The ELB divvies traf­fic up even­ly across our nginx instances. Nginx is aware of our var­i­ous ser­vices and will choose the appro­pri­ate app instance for that request.

Because we run in AWS and we care about high avail­abil­i­ty, we run servers in trip­li­cate by run­ning in mul­ti­ple avail­abil­i­ty zones (AZ). Side note: if you aren’t famil­iar with the idea behind avail­abil­i­ty zones, watch this (~6min), it’s pret­ty cool stuff. To max­i­mize per­for­mance and iso­late prob­lems, we like to keep traf­fic with­in the same AZ. We think of an AZ as a sep­a­rate data cen­ter, so it makes sense not to hop back and forth between data cen­ters while ser­vic­ing each request.

By reduc­ing the num­ber of servers avail­able in one AZ, that became our Test set. Because the ELB would con­tin­ue to divvy up traf­fic even­ly across all three zones, the oth­er two were com­plete­ly unaf­fect­ed and became our Control set. By reduc­ing the num­ber of servers in the Test set we could grad­u­al­ly increase the amount of traf­fic han­dled by each serv­er. We mon­i­tored per­for­mance for signs of degra­da­tion. Once we began to observe degrad­ed ser­vice, bin­go, we knew the max­i­mum load.

The top dashboard is showing average response times and the bottom is the 90th percentile (p90) response times. Yellow is the Control set and purple is Test. The red lines show two separate scaling events. Performance didn’t seem to deviate too much, though it does show several high peaks after the second downscaling event.

As we con­tin­ued to shed servers in the Test set, we also kept an eye on CPU uti­liza­tion. By incre­men­tal­ly ratch­et­ing up traf­fic, we could observe the CPU char­ac­ter­is­tics at max­i­mum load. You can see the impact to CPU after we increased the amount of traf­fic (the two green lines) vs the CPU of a serv­er in the con­trol AZ. Instance IDs blurred to pro­tect the innocent.

You can see the impact to CPU after we increased the amount of traffic (the two green lines) vs the CPU of a server in the control AZ. Instance IDs blurred to protect the innocent.

We repeat­ed this same test with a few dif­fer­ent instance types to find the sweet spot for us. The ser­vice under test was our old­est in our infra­struc­ture and was run­ning on m1.large instances. We final­ly land­ed on c4.xlarge and found, not only could we cut our hourly spend in half, but per­for­mance actu­al­ly improved by 2x! The per­for­mance improve­ment was an unex­pect­ed bonus.

Takeaways

  1. After test­ing a few dif­fer­ent instance types and find­ing the max­i­mum load a serv­er could han­dle we were able to run a quar­ter as many app servers. Our hourly (non-reserved) spend dropped by 50%.
  2. Despite the huge cost sav­ings, we also saw a 2x improve­ment in response times! This came about by get­ting onto the new­er instance fam­i­ly. In our case, this was a move from the m1 to the c4 family.
  3. Something we observed (and it would be sweet if Amazon made it clear­er) is that com­pute, or Cores, are not apples-to-apples across instance fam­i­lies. Within a fam­i­ly, the 2x, 4x, 8x instances are apples-to-apples. The m4.4xlarge is pret­ty much twice as fast as the m4.2xlarge. But, the two cores on the m1.large are much slow­er than the two cores on a c4.large. Some of these instance fam­i­lies are pret­ty old, the M1 fam­i­ly was released in 2007, a good default is to always choose the most recent generation.
  4. Amazon has excel­lent details about instance types online, but they make it near­ly impos­si­ble to eas­i­ly com­pare them. Luckily, there are a num­ber of sites avail­able for just this pur­pose. I enjoy ec2in​stances​.info.
  5. While test­ing anoth­er ser­vice, we observed a sin­gle c4.xlarge instance (16 ECU) han­dle the same load that 21 m3.medium (3 ECU). And, it was run­ning around 12% CPU uti­liza­tion vs the 25 – 30% on the m3.mediums! Oh, and response times went down, once again, by half!
  6. We found that per­for­mance began to suf­fer at around 40 – 50% aver­age CPU uti­liza­tion. At Hudl, we want to be able to lose an entire AZ (one third of our capac­i­ty) at any time with­out degrad­ing per­for­mance. Assuming 35% uti­liza­tion is our max, we need to actu­al­ly aim for 35% * ⅔, or around 23%. That way, in the event of an entire AZ fail­ure, we can absorb that traf­fic into the oth­er two and still main­tain performance.
  7. Having the tool­ing and infra­struc­ture in place to quick­ly route traf­fic made it easy to con­duct this exper­i­ment with min­i­mal risk to our users. We invest a lot of time and effort in our foun­da­tion. This is one of the many ways that invest­ment pays off.

Interested in work­ing on prob­lems like this? We should talk.