Interview with a Data Scientist

An interview I did for data scientist and blogger Peadar Coyle.


Kale: timeseries data mining at Etsy

Etsy loves metrics. Everything that happens in our data centres gets recorded, graphed and stored. But with over a million metrics flowing in constantly, it’s hard for any team to keep on top of all that information. Graphing and alerting don’t scale well, so we’ve started the Kale project, to help make sense of all those time series.


You sure don’t look like a scientist

The one where Andy nearly gets thrown out of the USA for not fitting the correct stereotype.


The unwisdom of the crowd: marketing, metrics & machine learning

Vanity metrics” is a phrase I’ve heard cropping up a few times recently, in the context of growth engineering, the lean startup movement, and discussions around product lifecycles. In trying to understand what this refers to, I had a small epiphany: people in the digital marketing world are dismissing really important data as “vanity metrics” because they’re only talking about aggregates. But the same data at the level of individuals can make or break a business. If you’re trying to grow, you need to understand this.


All your Bayes are belong to us! PyMC, PyStan and emcee in Snake Charmer

PyMC is the most widely-used Python package for Bayesian modelling, learning and inference. But there are loads of other tools out there that may be better fitted to your particular task. I’ve added two of them to Snake Charmer so you can try them for yourself.


Lyric clouds, genre maps and distinctive words

Repost of an article I wrote for the blog in 2011. What do different genres of music look like if you visualize the words used in their lyrics? The results are… Eye-opening.


Snake Charmer: the all-in-one data science toolbox for Python 3

Wouldn’t it be great if you could magic up an IPython Notebook server, complete with SciPy, Pandas, Matplotlib, PyMC, scikit-learn, R and Octave integration, and much much more, just by typing one command? And wouldn’t it be even better if you could do that from pretty much any Windows, Mac or Linux machine, and know that you’d get the exact same environment every time? That’s Snake Charmer — a virtual data workbench that’s reproducible, portable, shareable and up-to-date.


Lies, Damned Lies and Dataviz

A guest post on the Pearson Labs blog, on bad visualization, how to spot it, and how to avoid making the same mistakes yourself.



