Updated on Fri 28 October 2016

Talks and workshops

Here’s my talk from Data Natives 2016 in Berlin, on semantic similarity and taxonomic distance: with presenter notes, or without presenter notes (both PDF).

In June 2016 I gave a talk at Berlin Buzzwords on learning to rank, introducing the field and walking through the Ranking SVM algorithm. The slides and video are available here.

I’ve presented Etsy’s work on mining timeseries data, including anomaly detection and similarity search, at several venues including Monitorama, SRECon EU and Berlin Buzzwords. You can see the Buzzwords video here and download the slides here.

I gave a talk at Big Data eXchange, November 2014, entitled “Lies, damned lies and dataviz” on bad visualization, how to spot it, and how to avoid making the same mistakes yourself. This was a lot of fun to do, and hopefully quite instructive too. You can watch the video courtesy of Skills Matter or click here for the slides. My PyData London talk in Feb 2015 was on the same topic.

Here are the slides from a talk on Big Data and other myths at the Chief Data Officer Summit in November 2014. This was an opinion piece where I attempted to skewer three of the biggest myths in data: that “big data” is a thing, that data scientists are hard to find, and that there’s a magic platform out there that will solve all your data problems.

I ran a Snake Charmer intro/tutorial at PyData London in August 2014. Slides are here and the demo notebook can be viewed here. If you install Snake Charmer, the same notebook is included, in the Demo folder.

For posterity, here are the slides from an earlier version of Lies, damned lies and dataviz, at the Big Data Innovation Summit 2014.

Here are my presentations from a couple of Elasticsearch London meetups: Tuning Elasticsearch for multi-terabyte analytics and Cardinality estimation in Elasticsearch. The latter is a little out of date as a very similar feature is now included with Elasticsearch.

Slides and video from QCon entitled Approximate methods for scalable data mining. This covers the theory behind the cardinality estimation technique we used in Elasticsearch, amongst other things.

Video from a Big-O London meetup: Scaling up k-nearest neighbours classification.

Video from a Hadoop UK User Group meetup on Data mining with Pig.

Probably my most-viewed deck: a short talk from 2013 entitled There’s no such thing as big data. The 2014 talk “Big data and other myths” (see above) covers this in more detail and more style.

No comments? I'm no longer sure blog comments are relevant. I'd rather you replied on Twitter, or wrote a response on your own blog or a site like Medium.

All content (cc) Andrew Clegg, under Creative Commons Attribution-ShareAlike 4.0 License. Built on Pelican & Python. Theme based on svbhack by Giulio Fidente.