Recently I've presented in a couple of Meetups the great work we're doing at Simply Business' data and analytics team. Specifically, we've shared how we use two open source tools, Spark and Snowplow.
Identifying Returning Users with Spark
In this talk, at Spark London Meetup, I presented how we identify returning users both in near real-time and batch modes.
It's worth mentioning that Giraph remains a better batch option for huge graphs, but in our case Spark was fast enough and we highly value its versatility.
Snowplow & NRT Event Processing
The second talk was at Snowplow Analytics London Meetup, were we presented how we have adapted the framework to run on top of Spark Streaming. We also showcased a couple of use cases we have implemented. You can read more about it at Snowplow's blog.
Do you also use Spark or Snowplow? Share your experience in the comments!