Logging All the Things
Logging in your application is super important. Early on it’ll be fine to peruse your logs manually. As traffic increases you’ll soon have a need to aggregate all that data into one place so it can be easily searched. At Hudl, we chose Splunk. There are a lot of competing products, but it works well for us. Either way my advice is clear: Log. All the things. You won’t regret it.
Logging All the Things
Logging in your application is super important. Early on it’ll be fine to peruse your logs manually. As traffic increases you’ll soon have a need to aggregate all that data into one place so it can be easily searched. At Hudl, we chose Splunk. There are a lot of competing products, but it works well for us. Either way my advice is clear: Log. All the things. You won’t regret it.
Logging in your application is super important. Early on it’ll be fine to peruse your logs manually. As traffic increases you’ll soon have a need to aggregate all that data into one place so it can be easily searched. At Hudl, we chose Splunk. There are a lot of competing products, but it works well for us.
Either way my advice is clear: Log. All the things. You won’t regret it.
Here are the things we log — you should too.
Log: Errors
Error logging is an obvious one. When an error happens in Hudl, we log all exception data, the HTTP request headers and path, the user ID (for logged-in requests), and the source IP. Pro tip: don’t forget to scrub sensitive values like passwords and credit card numbers before logging.
Log: Audit or Analytics Data
All user-initiated action in Hudl gets logged. I’m not talking about a simple request for style.css, I’m talking about actions. Clicks, page loads, etc. These follow a consistent format but each message can include some contextual information to help us see exactly what actions were taken, by whom, and at what time:
2013-07-11 21:13:48,367 p-web-a-4yrce4 [INFO ] [Audit] [request_id=c523a1e3] App=Hudl,Func=View,Op=Clip,Ip=6.10.16.23,AuthUser=123456,Team=555,Attributes=[Clip=4944337215,TimeMs=10]
Log: Performance Samples
We log the response time for a random sampling of our requests. The sampling lets us keep the log volume manageable and still get accurate performance data.
Log: Detalied Performance Metrics on Key Features
For Hudl, keeping video fast and accessible is our #1 priority. For that reason, we log all video-related requests. That gives us detailed insight into Hudl’s most critical function. My advice is simple here: sample everything, but log the crap out of the key functions of your system.
Some Helpful Tips
Over time, we’ve learned a few lessons that have helped us immensely.
- Log all user actions. Don’t argue, just do it. When a paying customer calls in, sure that your system lost their data, it is pretty cool to be able to solve the case in under a minute by proving that it was actually Bob Smith (at exactly 9:23 a.m. last Tuesday) who deleted that document. Your customers will maintain trust in your product, and you’ll avoid spending hours tracking through database backups because you weren’t sure if it was your fault.
- Create request identifiers. Each request gets a (somewhat) unique “requestId.” That ID is automatically included in any and all log messages written during the service of the request. This allows us to correlate them and get a full understanding of how the request flowed through Hudl. Early on this is less important, but as your application grows in complexity these requestIds become indispensable.
- Create a fingerprint, or signature, for each unique error. It’s important that the signature disregard any contextual information so that we can correlate the same error even if two separate users run into it. That signature is useful from an operational standpoint to find patterns and prioritize bug fixes.
It’s All About the Tooling
In the early days of Hudl, we needed to stay laser focussed on building an awesome product. Rolling our own log aggregation and search tool was too much of a distraction.
There are a number of commercial products out there, and I would encourage you to check them out. Loggly, PaperTrail, Splunk Storm, and Sumo Logic are all viable options. If you are willing to put in some work, Kibana looks like a pretty cool open source option. Despite the cost, we’ve been happy with Splunk. A few hours and one payment later, our logging problems were solved.
The bottom line is this: if you can’t quickly access your log data to make decisions, why do you even have it?