Plug into Big Data
with Juju
Terms and Definitions
These 3 concepts are enough to represent any application. It’s Juju, it’s magic!
Juju
What is Juju?
What is Juju? (contd)
Challenges of building Big Data solutions
Big Data ecosystem
Challenges of building Big Data solutions
Hadoop distributions
Big Data Solution Components
Pluggable model to enable the Big Data ecosystem
Pluggable Stack
Pluggable Installation
resources:
hive-ppc64le:
url: http://<url>/apache-hive-0.13.0-bin.tar.gz
hash: 4c835644eb72a08df059b86c45fb159b95df08e831334cb57e24654ef078e7ee
hash_type: sha256
hive-x86_64:
url: http://<url>/apache-hive-1.0.0-bin.tar.gz
hash: b8e121f435defeb94d810eb6867d2d1c27973e4a3b4099f2716dbffafb274184
hash_type: sha256
Pluggable Configuration
vendor: 'apache'
hadoop_version: '2.4.1'
packages:
- 'libmysql-java'
- 'mysql-client'
groups:
- 'hadoop'
users:
hive:
groups: ['hadoop']
dirs:
hive:
path: '/usr/lib/hive'
owner: 'hive'
group: 'hadoop'
ports:
hive:
port: 10000
Plugin Charm
includes: [‘interface:hadoop-plugin’]
@when(‘hadoop.yarn.ready’, ‘hadoop.hdfs.ready’)
def setup_pig(hadoop, *args):
pig.install()
pig.configure()
Hadoop Core
Big Data Core
Apache Hadoop Core
Hadoop Core Batch Processing
juju quickstart apache-core-batch-processing
Provides a distributed Hadoop Cluster ready for batch processing with Plugin capabilities to add in additional functionality
Data Ingest
Add Apache Flume
Data Ingest with Apache Flume
juju quickstart u/bigdata-dev/apache-ingestion-flume
Provides:
Add Apache Kafka
Data Ingest with Apache Kafka
juju quickstart u/bigdata-dev/apache-flume-ingestion-kafka
Provides:
Data Analysis
Add Apache Pig
Data Analysis with Apache Pig language
juju quickstart apache-analytics-pig
Provides:
Add Apache Hive
Data Analytics with MySQL
juju quickstart apache-analytics-sql
Provides:
Data Visualization
Add iPy Notebook
Hadoop Core + Spark with Notebook Viz
juju quickstart apache-hadoop-spark-notebook
Provides:
Add Apache Zeppelin
Hadoop Core + Spark with Notebook Viz
juju quickstart apache-hadoop-spark-zeppelin
Provides:
+ Spark
Spark + Hadoop
Spark Service
Spark with Layers
https://github.com/johnsca/layer-apache-spark
Refactoring Spark charm using layers, to make it easier to extend
@when('bootstrapped')
@when_not('spark.installed')
def install_spark():
spark = Spark()
if spark.verify_resources():
hookenv.status_set('maintenance', 'Installing Apache Spark')
spark.install()
set_state('spark.installed')
@when('spark.installed', 'hadoop.yarn.ready', 'hadoop.hdfs.ready')
def start_spark(*args):
hookenv.status_set('maintenance', 'Setting up Apache Spark')
spark = Spark()
spark.configure()
spark.start()
spark.open_ports()
set_state('spark.started')
hookenv.status_set('active', 'Ready')
Build Your Solution
Build and Share Your Solution
Real-time Syslog Analytics
juju quickstart realtime-syslog-analytics
Provides:
Ecosystem Solutions
References and Contact Info
Thanks!