1 of 25

Apache Hadoop YARN

Vavilapalli, et al

Sami Rollins

2 of 25

Introduction

https://www.confluent.io/wp-content/uploads/data-flow-ugly-1-768x427.png

3 of 25

Introduction

Distributed storage

Relational databases
Key-Value stores
Caches
Distributed file systems

Distributed computation

MapReduce
Spark
BOINC

4 of 25

Common Problems

Scalability

Storage:
Computation:

Fault tolerance

Storage:
Computation:

5 of 25

Common Problems

Scalability

Storage: must balance C and A in order to scale
Computation: efficient use of resources; communication between components

Fault tolerance

Storage: loss of data
Computation: lost work

6 of 25

YARN

General framework for performing distributed computation on a cluster

7 of 25

MapReduce

https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf

8 of 25

MapReduce

URL Access Frequency

Map: outputs <URL, 1> for each URL in page request logs
Reduce: adds values for same URL and outputs <URL, total_count>

Inverted Index

Map: outputs <word, doc_id>
Reduce: outputs <word, list(doc_id)>

9 of 25

YARN

MapReduce commonly used for running computation jobs in a cluster, but not always the right programming model

Map-only jobs
Jobs that require multiple iterations

ML algorithms

10 of 25

YARN

Framework for requesting compute resources

Supports MapReduce or other platforms that use resources in different ways

11 of 25

YARN

https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/YARN.html

12 of 25

Architecture

Resource Manager

Node Manager

Application Master

13 of 25

Architecture

Resource Manager

One per cluster
Client requests to schedule a job
Scheduler is a pluggable component
ApplicationsManager launches the per-application ApplicationMaster

14 of 25

Architecture

Resource Manager - HA

https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

15 of 25

Architecture

Node Manager

One per node
Allocates containers

Copies dependencies

Monitors node health
RM detect failure through heartbeats

16 of 25

Architecture

Application Master

Written by user

Request containers
Launch application
Check for replies

https://github.com/hortonworks/simple-yarn-app

17 of 25

Writing a YARN Application - Client

// Create yarnClient

YarnConfiguration conf = new YarnConfiguration();

YarnClient yarnClient = YarnClient.createYarnClient();

yarnClient.init(conf);

yarnClient.start();

18 of 25

Writing a YARN Application - Client

// Set up the container launch context for the application master

ContainerLaunchContext amContainer =

Records.newRecord(ContainerLaunchContext.class);

amContainer.setCommands(

Collections.singletonList(

"$JAVA_HOME/bin/java" +

" -Xmx256M" +

" cs682.yarn.ApplicationMasterAsync" +

" " + command +

" " + String.valueOf(n) +

" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +

" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"

)

);

19 of 25

Writing a YARN Application - AM

// Resource requirements for worker containers

Resource capability = Records.newRecord(Resource.class);

capability.setMemory(128);

capability.setVirtualCores(1);

// Make container requests to ResourceManager

for (int i = 0; i < n; ++i) {

ContainerRequest containerAsk = new ContainerRequest(capability, null, null, priority);

System.out.println("Making res-req " + i);

rmClient.addContainerRequest(containerAsk);

}

20 of 25

Writing a YARN Application - AM

// Launch container by create ContainerLaunchContext

ContainerLaunchContext ctx =

Records.newRecord(ContainerLaunchContext.class);

ctx.setCommands(

Collections.singletonList(

command +

" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +

" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"

));

21 of 25

Observations

AM can get status from RM, or implement Async AM and get callbacks on critical events
AM can also query NM
REST APIs exist
Timeline Server reports events/status

Overall, very barebones

22 of 25

Other Distributed Systems/Applications

Kafka
BitTorrent
BOINC

23 of 25

Kafka

https://www.confluent.io/blog/stream-data-platform-1/

24 of 25

BitTorrent

https://en.wikipedia.org/wiki/BitTorrent

25 of 25

BOINC

“Volunteer cloud”
Originally SETI@home