Apache Spark – crash course by Marcin Szymaniuk (preregistration form)
Trainer: Marcin Szymaniuk
Data developer, Data infrastructure administrator and Consultant at TantusData. He has a lot of hands-on experience with technical and business problems related to Big Data. Companies Marcin has worked or consulted for include: Spotify, TrueCaller and most recently Apple.

When: November 3th
Price: 120 EUR
Where: Kyiv

+ About the workshop:
The course is dedicated to people who have no previous Spark experience. The ultimate goal is to provide an overview of the most important Spark features so attendees get enough knowledge to start building their first Spark applications.
No prior experience in Spark is required. All the hands­on exercises will be in Scala but they will be simple enough for anybody with good knowledge of any modern programming language.
All participants should have VirtualBox installed on their laptop so they can do hands on exercises and fully benefit from the workshop.

+ About the trainer:
Marcin is a data developer and architect with experience in data infrastructure administration. His main strength is that his knowledge is proven on real­life big data related problems that he solves on a daily basis (have worked for companies like Spotify and Apple, currently consulting on Big Data projects). The course emphasises practical aspects of Spark and common problems and misconceptions that he encounters when helping clients.
The course is an introduction to Spark led by a “hands­on” practitioner who gained his experience solving real­life problems for many of his clients.

+ Programme overview:
1. Introduction to Spark
● What is Spark?
● Spark vs Hadoop
● Spark with HDFS : quick overview
● Spark on YARN : quick overview

2. Basic building blocks in Spark
● Introduction to Resilient Distributed Datasets
● Spark shell
● Overview of RDD operations
● Hands­on exercises: Log processing using simple transformations

3. Basic building blocks in Spark
● Key­Value Pair RDDs
● Aggregating Data with pair RDDs
● Hands­on exercises: Word count

4. Writing and deploying Spark applications
● Building Spark applications
● Submitting a Spark application to a cluster
● Hands­on exercises: Joining RDDs

5. Spark on a cluster
● Spark Web UI
● RDD partitions : on HDFS, on local filesystem, after shuffle
● Execution model overview : Stages, Tasks, Executors
● RDD persistence
● Data Locality
● Fault tolerance
● Spark Config: important options
● Logging, YARN log aggregation.

6. SQL­like Spark features
● Spark SQL
● DataFrames
● DataSets
● Hands­on exercises: Spark­SQL aggregations

7. Spark use cases overview
● Data analysis
● Machine learning
● Iterative algorithms

Bonus exercises: Spark­SQL aggregations, Page Rank, Data generation with Spark, Broadcast join, Skewed join problem, AggregateByKey challenge, Tree­reduce.

https://drive.google.com/open?id=0B-UatHdc8LDeTFNCQ05OaWFzek11eDg2YTctSUpScEdseXJB
Contacts: contact@javaday.org.ua

Name
Your answer
Surname
Your answer
Email
Your answer
Submit
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Additional Terms