The Low-Hanging Fruit of Redshift Performance
This post is a case-study about how we fixed some Redshift performance issues we started running in to as we had more and more people using it. A lot of the information I present is documented in Redshift’s best practices, but some of it isn’t easily findable or wasn’t obvious to us when we were just getting started. I hope that reading this post will save you some time if you are just getting started with Redshift and want to avoid some of the pitfalls that we ran into!
Populating Fulla with SQL Data and Application Logs
This is the second in a series of posts on Fulla, Hudl’s data warehouse. This post discusses our methods to update Fulla daily with data from our production SQL databases and our application logs.
Over the last year, the Data Engineering squad has been building a data warehouse called Fulla. Recently, the squad rethought our entire data warehouse stack. We’ve now released Fulla v2 and Hudlies are querying data like never before giving us a better understanding of our customers and our product.
Data Science on Firesquads: Classifying Emails with Naïve Bayes
At Hudl, each squad on the product team takes two weeks each year to help out the coach relations team in an ongoing rotation known as Firesquads. This year, for Firesquads, the data science squad built a Naïve Bayes classifier to automate the task of categorizing emails.
Using Deep Learning to Find Basketball Highlights
Hudl stores petabytes of video. In that video there are a lot of awesome plays. Figuring out which plays are the most interesting and sifting through the uninteresting footage is a huge challenge. To solve this problem, we leveraged deep learning, Amazon Mechanical Turk, and crowd noise. The result: basketball highlights!