At my last job before Smartling – a big global ad agency with a 50-person developer team – I built a time-scheduling application that allowed project managers to schedule the efforts of us code monkeys. One of my great satisfactions was walking around the open floor of the department and seeing the application I had built up on many screens at any given time. It was a kind of poor man’s Google Analytics. If you want to track the use of your product, just hover creepily behind your user’s Aeron and see what he’s doing!
But what about the case where you can’t be in the same room as your users? Event tracking and analytics are not exactly new-fangled. Google Analytics, Omniture and their ilk have been around for many years. But one thing that has always frustrated me about these solutions is that while they give a good statistical big picture of the use of your application, a birds-eye-view, it’s not really possible to drill down into the specifics of each, solitary, tracked event. As in: give me all the click events and associated parameters that occurred for a specific user during a one minute stretch of time last Friday night. What language was he translating? If he saved a translation, what did the translation look like before he saved it? How long did the AJAX calls he invoked take to complete? Any errors? How many cans of beer did in he have in him at the time?
Logging: Not Just for Lumberjacks
That’s not analytics though. It’s logging. Yes, you want big picture analytics too (total counts for a day, etc.), but being able to rewind time and see the discrete data, line by line, as it rolls in is powerful.
If you send them your data in the form of JSON, you can then write queries against it as structured data! I like to send them the first name of the user so that when our customer support team comes to me saying that a translator named Jorge filed a support ticket last night, I can write a quick query: search json.firstName:”Jorge” from NOW-1DAY. See what that dude was up to! I always send the user’s current URL (at time of event) as a param so that I can instantly bring it up in a browser and try to reproduce the reported error. You don’t have to rely on Jorge to carefully document what went down at 2012-05-11 10:09:49.236.
What’s really new about all this is that such a massive quantity of data can be stored cheaply, and, more importantly, searched almost as soon as as it’s written to disk. Data is worthless unless you can search it. This is the phenomenon known as “big data.” In a nutshell, what “big data” means is that the same technology developed by Google to index the entire internet and make it instantly full-text searchable (it’s called Hadoop) can be used to search any source of data – your event logs, for example. As a front end developer, you probably wouldn’t want to set Hadoop up yourself, but that’s what companies like Loggly, Splunk and Mixpanel are doing for you.