When: November 3thPrice: 120 EURWhere: Kyiv
+ About the workshop:The course is dedicated to people who have no previous Spark experience. The ultimate goal is to provide an overview of the most important Spark features so attendees get enough knowledge to start building their first Spark applications.No prior experience in Spark is required. All the handson exercises will be in Scala but they will be simple enough for anybody with good knowledge of any modern programming language.All participants should have VirtualBox installed on their laptop so they can do hands on exercises and fully benefit from the workshop.
+ About the trainer:Marcin is a data developer and architect with experience in data infrastructure administration. His main strength is that his knowledge is proven on reallife big data related problems that he solves on a daily basis (have worked for companies like Spotify and Apple, currently consulting on Big Data projects). The course emphasises practical aspects of Spark and common problems and misconceptions that he encounters when helping clients.The course is an introduction to Spark led by a “handson” practitioner who gained his experience solving reallife problems for many of his clients.
+ Programme overview:1. Introduction to Spark● What is Spark?● Spark vs Hadoop● Spark with HDFS : quick overview● Spark on YARN : quick overview
2. Basic building blocks in Spark● Introduction to Resilient Distributed Datasets● Spark shell● Overview of RDD operations● Handson exercises: Log processing using simple transformations
3. Basic building blocks in Spark● KeyValue Pair RDDs● Aggregating Data with pair RDDs● Handson exercises: Word count
4. Writing and deploying Spark applications● Building Spark applications● Submitting a Spark application to a cluster● Handson exercises: Joining RDDs
5. Spark on a cluster● Spark Web UI● RDD partitions : on HDFS, on local filesystem, after shuffle● Execution model overview : Stages, Tasks, Executors● RDD persistence● Data Locality● Fault tolerance● Spark Config: important options● Logging, YARN log aggregation.
6. SQLlike Spark features● Spark SQL● DataFrames● DataSets● Handson exercises: SparkSQL aggregations
7. Spark use cases overview● Data analysis● Machine learning● Iterative algorithms
Bonus exercises: SparkSQL aggregations, Page Rank, Data generation with Spark, Broadcast join, Skewed join problem, AggregateByKey challenge, Treereduce.