Conformal Prediction in Spark
Who am I?
Background
Computer Science Bioinformatics
PhD student – Uppsala University
Department of Information Technology
Department of Pharmaceutical Biosciences
Rome
Uppsala
Today’s plan
Takeaways: build large-scale CP, large-scale interactive analysis and visualization
Today’s plan
Takeaways: build large-scale CP, large-scale interactive analysis and visualization
Why Apache Spark?
Apache Spark is the most active open source large-scale data processing engine
1000+ contributors from over 250 organizations
Originally born to overcome MapReduce lack of dataset caching
Spark: Cluster Computing with Working Sets, Zaharia et al. (2010)
It allows for interactive analysis
A unified computing engine
Spark Core
RDD API
Spark SQL
Spark Streaming
MLlb
GraphX
Data
sources
Environments
Apache Spark architecture (1)
Standalone cluster mode
Spark Master
Spark Worker
Spark Worker
Network
Driver Program
SparkContext
Spark Master
Spark Worker
Spark Executor
Spark Worker
Spark Executor
Apache Spark architecture (2)
Execution model
Spark Master
Spark Worker
Spark Worker
Network
Driver Program
SparkContext
Spark Master
Spark Worker
Spark Executor
Spark Worker
Spark Executor
Today’s plan
Takeaways: build large-scale CP, large-scale interactive analysis and visualization
Today’s plan
Takeaways: build large-scale CP, large-scale interactive analysis and visualization
Questions?