How We Stay Sane with a Large AWS Infrastructure
We’ve been running hudl.com in AWS since 2009 and have grown to running hundreds, at times even thousands of servers. As our business grew, we developed a few standards that help us make sense of our large AWS infrastructure.
Measuring Availability: Instead of Nines, Let’s Count Minutes
It’s hard to find detailed explanations about how companies go about computing and tracking their availability, particularly for complex SaaS websites. Here’s how we do it for our primary web application, hudl.com.
Migrating Millions of Users in Broad Daylight
In August we migrated our core user data (around 5.5MM user records) from SQL Server to MongoDB. We moved the data during the daytime while still taking full production traffic, maintaining nearly 100% availability for reads and writes during the course of the migration. Our CPO fittingly described it as akin to “swapping out a couple of the plane’s engines while it’s flying at 10,000 feet.” I’d like to share our approach to the migration and some of the code we used to do it.
Faster and Cheaper: How Hudl Saved 50% and Doubled Performance
We took time to optimize our EC2 instance types. By finding the maximum load a server could handle we were able to run a quarter as many app servers. Our hourly spend dropped by 50%. Despite the huge cost savings, we also saw a 2x improvement in response times! This came about by moving to a newer instance family.
How Our Product Team Works
One part of Hudl I frequently have to explain to people outside the company is the structure of our product team. Fellow developers at other companies, friends I graduated with, and plenty of people in between want to know how Hudl works — and as it turns out, there’s a lot to talk about. We’re constantly evolving and learning more about how to keep our heads on straight, and as we do, we want to get the lessons learned on the table.
Exploring the Skunkworks Genius
Speed, innovation, and creativity… the key components of creative genius. Skunkworks is specifically designed to unleash the creative wrath of our product team. At Hudl, we use Skunkworks to explore new technologies and tools that make us better at what we do.
Deploying in the Multiverse
At Hudl, we like to move quickly. We are constantly fixing issues, building new features, and improving the experience for our coaches and athletes. We put a lot of thought into how we work and dedicate a lot of time to making sure we are working as efficiently as we can. So, when we began to run into major bottlenecks in our deployment process, we realized we needed a major change. We came up with a plan to break our monolithic application into smaller components, and thus The Multiverse was born.
Deploying Our Monolith Application
As a company, we understand that one of our key competitive edges is moving quickly. We develop and ship new features continuously. Before we started moving toward our microapplication architecture, we were deploying our monolithic application ten times a day. Even though the number of Monolith deployments is trending down as we break it out into multiple, smaller applications, we still rely on our deployment infrastructure to deliver multiple payloads daily.
Queuing Up Heavy Tasks to an Autoscaling Worker Farm
An autoscaling farm of AWS EC2 instances sits behind our front-facing web application, working on heavy, long-running tasks like video transcoding, thumbnail generation, and computer vision processing. It’s a battle-tested combination of queues, worker instances, and an orchestration service called Lifeguard that easily hammers through thousands of these CPU-bound jobs per minute.