Big Data in the Cloud?

Yes, you can do it in OpenStack


I am Obed N Muñoz

I am here because I love to give presentations.


Who am I?

Software Engineer


Fast driver


  • Introduction: Cloud and OpenStack
  • Data-processing
  • Sahara Project


Cloud Computing and OpenStack


Cloud and XaaS Era

Everything as a Service

Cloud computing term is used for a variety of services and applications emerging for users to access on demand over the Internet as opposed to being utilized via on-premises means.


OpenStack is a cloud operating system that controls large pools of compute storage and networking resources throughout a datacenter, all managed through a dashboard, CLI, RestFUL API ...



Data-Processing in the Cloud


What’s around Data-Processing?

  • Big Data
  • Data Science
  • Cloud
  • Machine Learning
  • Patterns Recognition
  • Neural Networks
  • Etc ...

Data-Processing Technologies

Sahara Project

Data-Processing in OpenStack


OpenStack Sahara

The Sahara project provides a simple means to provision data-intensive application cluster (Spark or Hadoop) on top of OpenStack.


Getting Started

  • Clusters
  • Templates
  • Provisioning Plugins
  • Image Registry
  • Data Processing Frameworks
  • Elastic Data Processing (EDP)

More Features ...

  • OpenStack Block Storage support
  • Cluster Scaling
  • Data locality
  • Distributed Mode
  • Hadoop HDFS High Availability
  • Orchestration support

Clusters (Hadoop)

Data-Processing Frameworks

  • Hadoop
  • Spark
  • Storm

Provisioning Plugins

  • Vanilla - Vanilla Apache Hadoop
  • Ambari - Hortonworks Data Platform
  • Spark - Apache Spark with Cloudera HDFS
  • MapR Distribution - MapR plugin with MapR File System
  • Cloudera - Cloudera Hadoop

Elastic Data Processing (EDP)

Allows the execution of jobs on cluster created from Sahara. It supports:

  • Hive, Pig, MapReduce.Streaming, Java, Shell job types on Hadoop clusters
  • Spark jobs
  • Shared File system service (manila), or Sahara own database
  • Access to input and output data sources in:
    • HDFS
    • Swift
    • Manila


Q & A



Any questions?

You can find me at:

  • @obedmr
Big Data in OpenStack - Google Slides