Measuring Availability: Instead of Nines, Let’s Count Minutes
It’s hard to find detailed explanations about how companies go about computing and tracking their availability, particularly for complex SaaS websites. Here’s how we do it for our primary web application, hudl.com.
Jan 24, 2016
Migrating Millions of Users in Broad Daylight
In August we migrated our core user data (around 5.5MM user records) from SQL Server to MongoDB. We moved the data during the daytime while still taking full production traffic, maintaining nearly 100% availability for reads and writes during the course of the migration. Our CPO fittingly described it as akin to “swapping out a couple of the plane’s engines while it’s flying at 10,000 feet.” I’d like to share our approach to the migration and some of the code we used to do it.
Oct 23, 2015
Queuing Up Heavy Tasks to an Autoscaling Worker Farm
An autoscaling farm of AWS EC2 instances sits behind our front-facing web application, working on heavy, long-running tasks like video transcoding, thumbnail generation, and computer vision processing. It’s a battle-tested combination of queues, worker instances, and an orchestration service called Lifeguard that easily hammers through thousands of these CPU-bound jobs per minute.
Nov 12, 2014