Re:invent 2016

JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing (BDM303)

By Richard T. Freeman

Abstract

Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.

Keywords

Serverless, RAVEN, AWS Lambda, event-driven ETL, extract-transform-load, machine learning, ML pipelines, real-time analytics, streaming analytics, big data pipelines .

Bibliographic Details

@inproceedings{freemanReInvent16,
   Author = {Freeman, Richard T.},
   Title = {BDM303 - JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing},
   BookTitle = {Amazon Web Services Re:Invent, 27 Nov. - 1 Dec. 2016},
   Address= {Las Vegas, USA},
   Publisher = {YouTube, SlideShare},
   Pages = {1-43},
   Year = {2016}
}