1 of 25

Apache Hadoop YARN

Vavilapalli, et al

Sami Rollins

2 of 25

Introduction

https://www.confluent.io/wp-content/uploads/data-flow-ugly-1-768x427.png

3 of 25

Introduction

  • Distributed storage
    • Relational databases
    • Key-Value stores
    • Caches
    • Distributed file systems
  • Distributed computation
    • MapReduce
    • Spark
    • BOINC

4 of 25

Common Problems

  • Scalability
    • Storage:
    • Computation:

  • Fault tolerance
    • Storage:
    • Computation:

5 of 25

Common Problems

  • Scalability
    • Storage: must balance C and A in order to scale
    • Computation: efficient use of resources; communication between components

  • Fault tolerance
    • Storage: loss of data
    • Computation: lost work

6 of 25

YARN

General framework for performing distributed computation on a cluster

7 of 25

MapReduce

https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf

8 of 25

MapReduce

  • URL Access Frequency
    • Map: outputs <URL, 1> for each URL in page request logs
    • Reduce: adds values for same URL and outputs <URL, total_count>

  • Inverted Index
    • Map: outputs <word, doc_id>
    • Reduce: outputs <word, list(doc_id)>

9 of 25

YARN

  • MapReduce commonly used for running computation jobs in a cluster, but not always the right programming model
    • Map-only jobs
    • Jobs that require multiple iterations
        • ML algorithms

10 of 25

YARN

  • Framework for requesting compute resources

  • Supports MapReduce or other platforms that use resources in different ways

11 of 25

YARN

https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/YARN.html

12 of 25

Architecture

  • Resource Manager

  • Node Manager

  • Application Master

13 of 25

Architecture

  • Resource Manager
    • One per cluster
    • Client requests to schedule a job
    • Scheduler is a pluggable component
    • ApplicationsManager launches the per-application ApplicationMaster

14 of 25

Architecture

  • Resource Manager - HA

https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

15 of 25

Architecture

  • Node Manager
    • One per node
    • Allocates containers
        • Copies dependencies
    • Monitors node health
    • RM detect failure through heartbeats

16 of 25

Architecture

  • Application Master
    • Written by user
        • Request containers
        • Launch application
        • Check for replies

https://github.com/hortonworks/simple-yarn-app

17 of 25

Writing a YARN Application - Client

// Create yarnClient

YarnConfiguration conf = new YarnConfiguration();

YarnClient yarnClient = YarnClient.createYarnClient();

yarnClient.init(conf);

yarnClient.start();

18 of 25

Writing a YARN Application - Client

// Set up the container launch context for the application master

ContainerLaunchContext amContainer =

Records.newRecord(ContainerLaunchContext.class);

amContainer.setCommands(

Collections.singletonList(

"$JAVA_HOME/bin/java" +

" -Xmx256M" +

" cs682.yarn.ApplicationMasterAsync" +

" " + command +

" " + String.valueOf(n) +

" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +

" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"

)

);

19 of 25

Writing a YARN Application - AM

// Resource requirements for worker containers

Resource capability = Records.newRecord(Resource.class);

capability.setMemory(128);

capability.setVirtualCores(1);

// Make container requests to ResourceManager

for (int i = 0; i < n; ++i) {

ContainerRequest containerAsk = new ContainerRequest(capability, null, null, priority);

System.out.println("Making res-req " + i);

rmClient.addContainerRequest(containerAsk);

}

20 of 25

Writing a YARN Application - AM

// Launch container by create ContainerLaunchContext

ContainerLaunchContext ctx =

Records.newRecord(ContainerLaunchContext.class);

ctx.setCommands(

Collections.singletonList(

command +

" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +

" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"

));

21 of 25

Observations

  • AM can get status from RM, or implement Async AM and get callbacks on critical events
  • AM can also query NM
  • REST APIs exist
  • Timeline Server reports events/status

  • Overall, very barebones

22 of 25

Other Distributed Systems/Applications

  • Kafka
  • BitTorrent
  • BOINC

23 of 25

Kafka

https://www.confluent.io/blog/stream-data-platform-1/

24 of 25

BitTorrent

https://en.wikipedia.org/wiki/BitTorrent

25 of 25

BOINC

  • “Volunteer cloud”
  • Originally SETI@home