1 of 28

Big Data in the Cloud?

Yes, you can do it in OpenStack

2 of 28

Hello!

I am Obed N Muñoz

I am here because I love to give presentations.

-vvvv

3 of 28

Who am I?

Software Engineer

Musician

Fast driver

4 of 28

Agenda

  • Introduction: Cloud and OpenStack
  • Data-processing
  • Sahara Project

5 of 28

Introduction

Cloud Computing and OpenStack

1

6 of 28

Cloud and XaaS Era

Everything as a Service

7 of 28

Cloud computing term is used for a variety of services and applications emerging for users to access on demand over the Internet as opposed to being utilized via on-premises means.

8 of 28

OpenStack

9 of 28

OpenStack is a cloud operating system that controls large pools of compute storage and networking resources throughout a datacenter, all managed through a dashboard, CLI, RestFUL API ...

10 of 28

11 of 28

Architecture

12 of 28

13 of 28

Data-Processing

Data-Processing in the Cloud

2

14 of 28

What’s around Data-Processing?

  • Big Data
  • Data Science
  • Cloud
  • Machine Learning
  • Patterns Recognition
  • Neural Networks
  • Etc ...

15 of 28

Data-Processing Technologies

16 of 28

Sahara Project

Data-Processing in OpenStack

3

17 of 28

OpenStack Sahara

The Sahara project provides a simple means to provision data-intensive application cluster (Spark or Hadoop) on top of OpenStack.

https://wiki.openstack.org/wiki/Sahara

18 of 28

Architecture

19 of 28

Getting Started

  • Clusters
  • Templates
  • Provisioning Plugins
  • Image Registry
  • Data Processing Frameworks
  • Elastic Data Processing (EDP)

http://docs.openstack.org/developer/sahara/userdoc/edp.html

20 of 28

More Features ...

  • OpenStack Block Storage support
  • Cluster Scaling
  • Data locality
  • Distributed Mode
  • Hadoop HDFS High Availability
  • Orchestration support

21 of 28

Clusters (Hadoop)

http://docs.openstack.org/developer/sahara/userdoc/edp.html

22 of 28

Data-Processing Frameworks

  • Hadoop
  • Spark
  • Storm

http://docs.openstack.org/developer/sahara/userdoc/edp.html

23 of 28

Provisioning Plugins

  • Vanilla - Vanilla Apache Hadoop
  • Ambari - Hortonworks Data Platform
  • Spark - Apache Spark with Cloudera HDFS
  • MapR Distribution - MapR plugin with MapR File System
  • Cloudera - Cloudera Hadoop

http://docs.openstack.org/developer/sahara/userdoc/edp.html

24 of 28

Elastic Data Processing (EDP)

Allows the execution of jobs on cluster created from Sahara. It supports:

  • Hive, Pig, MapReduce.Streaming, Java, Shell job types on Hadoop clusters
  • Spark jobs
  • Shared File system service (manila), or Sahara own database
  • Access to input and output data sources in:
    • HDFS
    • Swift
    • Manila

http://docs.openstack.org/developer/sahara/userdoc/edp.html

25 of 28

Resources

http://docs.openstack.org/developer/sahara/userdoc/edp.html

26 of 28

http://hackathon.openstackgdl.org/

27 of 28

Q & A

CONCLUSION

28 of 28

Thanks!

Any questions?

You can find me at:

  • @obedmr
  • obed.n.munoz@gmail.com