Go Parallel Data Processing
Golang, Apache Beam & Dataflow
Presented to: Atlanta Go User Group
Going to Cover
Code & Notes https://github.com/GLStephen/GoApacheBeamDemo
Stephen Johnston Jr.
Quick Survey
Why Beam
Realities of Data
What is Beam
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Flink, Apache Spark, and Google Cloud Dataflow.
What does beam solve?
Items
Wall Time
Apache Beam Support
Beam SDKs
Beam Runners
Beam Concepts�Very very high level…
Concepts
Components
Processing
https://beam.apache.org/documentation/programming-guide/�https://cloud.google.com/dataflow/docs/concepts/beam-programming-model
Runner (Dataflow)
Worker (Compute Eng.)
Pipeline �(Go Beam App)
Worker (Compute Eng.)
Pipeline �(Go Beam App)
Worker (Compute Eng.)
Pipeline�(Go Beam App)
Windows
Tumbling
Hopping
Session
Processing
Simple DoFn - Parallel
Counts - Global Window - Aggregation
Simple DAG
Complex DAG
Element Wise
Per-Key or Window
Autoscaling Managed for You
Code & Demo
Questions?