1 of 9

Heracles

Wojciech Bednarzak & Terry Bolt

2 of 9

Description

  • MapReduce framework focused on ease of use, modularity and scalability.
  • Fork of Cerberus MapReduce

  • Open Sourced: github.com/cpssd/heracles
    • 2000 SLOC of Rust
    • 1900 SLOC of Go�
    • 4500 SLOC total

3 of 9

Technologies Used

  • Core languages
    • Golang & Rust
  • Third party dependencies
    • RabbitMQ
  • Frameworks and Tools
    • gRPC with Protobufs
    • AMQP (RabbitMQ)
    • Bazel Build System (PR#422)�
  • Tested using CircleCI Continuous Integration

4 of 9

Architecture

User

Manager

Broker

State Store

Worker

Worker

Data Store

5 of 9

Architecture Benefits

  • Modular and Scalable
    • Easy to add additional modules (different brokers, state stores)
  • Simple and Efficient
    • Based on message passing and non-blocking operations

6 of 9

Data Flow

  • User makes a request (job)
  • Manager splits the job into map and reduce tasks
  • Manager sends the tasks through the broker to the workers
  • Workers execute the map/reduce tasks and save the data
  • Workers notify the manager of their completeness through state store

7 of 9

Input Partitioning

  • All intermediate and final files are known before running any task
    • Allows to prepare everything ahead of time
    • Easier to add a multi-step approach (map->map->map->reduce)
  • Number of reduce tasks depends on final files

8 of 9

Advantages (potential)

  • Multiple managers - state is shared, not kept by one manager
  • Broker queue allow all components to go down and tasks still be started
  • Automatic Restarts of failed jobs
  • Scaling to large amount of workers
    • Limited by broker, statestore and datastore

9 of 9

Disadvantages

  • Not as much control over life cycle of the job
    • Controlled to RabbitMQ
  • Requires maintaining third party dependencies
    • Currently RabbitMQ