Snowplow drives everything we do

What and why?

Bauer Media

  • Digital and print publisher
  • Family-owned German company
  • 116 sites across Australia and New Zealand
  • Tag management across all sites

Just start collecting

  • Snowplow data collection in 2014
  • We didn’t really have a use case

Stuff we record

  • Page views
  • Metadata around content
  • User logins
  • Email click-throughs
  • Ad impressions

Use cases started showing up

  • Cross-site integrated reporting
  • Ad hoc tricky analysis
  • Sanity checking industry audience reporting
  • Stalking individual users
  • Audience overlaps

Dolly usage by hour

User behaviour

Ad impressions

Content metadata

Trending service



Ad hoc analysis

Some things you can’t do in GA

  • Tag-based reporting
  • Accurate reporting of in-app Facebook using user-agent contains FBAN

We’re using Snowplow 0.9.2 from 2014-04-29!

  • It just works
  • We’ve been busy building other stuff


  • Page pings is b0rken: no time spent or scroll depth
  • (Out-of-the-box) browser categorisation is terrible
  • Hourly batches are a bit higher latency than we’d like
  • No context shredding, but JSON queries are performant enough

Web page

(JavaScript in page creates image beacon)




(Node app in Elastic Beanstalk)

Redirects to

Writes logs to


(Elastic Map Reduce)







  • Redshift can get very expensive very quickly
  • Decent dashboarding platforms are rare
  • And plenty of crap ones are overpriced
  • Just tip everything in and worry about what you’ll do later

What’s next?

Future plans

  • Upgrade ETL to real-time: probably our own solution
  • Time spent and scroll depth
  • Shredding?
Snowplow is at the core of everything we do - Google Slides