Watcher, the Infrastructure Optimization Service for OpenStack
Plans for Pike and Beyond
Boston Summit—May 8-12, 2017

OpenStack Watcher

  • A flexible and scalable resource optimization service for multi-tenant OpenStack-based clouds
  • Provide a pluggable framework for optimization strategies (algorithms/metrics)
    • Energy-aware optimizations
    • Workload consolidations and rebalancing optimizations

  • Watcher audits the cloud against a set of optimization algorithms and builds a recommended action plan to get to the right objectives
  • Integration point with external analytic systems through a pluggable scoring engine

2

Watcher is part of the OpenStack Big Tent

[1] https://www.vmware.com/products/vsphere/features/drs-dpm

Many Contributors

3

Of course we are welcoming more contributors for the Ocata cycle.

Key Features of Watcher

Watcher provides:

    • Cloud optimization using VM live migration in case of imbalance detection
    • Granularity of optimization with multiple goals setting from a set of hosts to an entire cloud
    • Opportunity for evolution via its flexible plugin structure
    • « On-the-shelf » optimization strategies based on CPU, memory and energy

Watcher can run in:

    • “SINGLE MODE” for auditing before acting
    • “CONTINUOUS MODE” for always-on optimization

4

Watcher in the OpenStack Ecosystem

  • Watcher leverages services provided by other OpenStack projects
      • VM live migration and resize
      • Metric collection
      • Power cycle bare metal nodes
  • Monitors the infrastructure and performs optimizations on-demand
  • Enables new ways for OpenStack administrators to reduce the cloud’s TCO

5

Watcher

(*)

Nova (*)

Compute

Keystone (*)

Authentication

Oslo (*)

Ironic (*)

Bare metal

Ceilometer
Monasca (*)

Metrics

*Other names and brands may be claimed as the property of others.

Watcher Workflow and Maturity

6

3 of 8

1 of 8

Maturity level

objective

Profile

Apply

Monitor

Optimize

Analyse

cost model

constraints

Virtual machine IOPS,

energy consumption,

resources usage

Aggregate flows of events from the infrastructure and take action

Profile and predict virtual machine resource usage

Find trade-offs between objectives and constraints

Schedule actions such that all security, dependency and performance requirements are met

Apply the optimal state

where the infrastructure

is utilized as efficiently

as specified in goals

6 of 8

Plan

Watcher Architecture

7

Admin Control

Nova

Neutron

Cinder

Cloud Infrastructure

Ceilometer (Gnocchi as API)

Monasca

Horizon

Watcher Command Line

Watcher Dashboard

Watcher API

Cluster Data Model

Metrics Collection

Watcher Bus

Watcher Decision Engine

Watcher Action Planner

Watcher Action Applier

Watcher DB

1

2

3

5

4

6

Watcher’s History

8

Proof of concept presented in Vancouver

Formed initial project team & mission statement

Big Tent inclusion

Newton release ready for small production deployment

September ‘15

May ‘15

  • Rebalance on server outlet temperature and additional telemetry
  • DevStack integration
  • Sparked community interest

April’16

(Newton)

  • Scoring module
  • Graph model
  • Scale testing

October’16

(Newton)

Watcher’s Roadmap

9

OpenStack compliant and ready for large production deployment

Pike release extends list of supported resources

September ‘17

(Pike)

April ‘17

(Ocata)

  • Scoring module
  • Graph model
  • Scale testing
  • Tagging VMs
  • HA mode
  • Grammar for workload characterization

Ocata Release Accomplishments

  • Notifications for objects
  • Automatic triggering audit
  • Provide alembic migrations
  • Generic way to define the scope of an audit (set of resources)
  • Service supervisor to monitor Watcher daemons

10

Strategies in Watcher

11

Strategy

Description

Telemetry used

Provider

Outlet temperature based migration strategy

Moves workload when server’s outlet temperature is higher than specified threshold

Outlet temperature

Intel

Basic consolidation strategy

Implements a basic load consolidation; this is currently a heuristic algorithm which focuses on measured CPU utilization and tries to minimize hosts which have too much and too little load and achieve a target high(ish) level for all hosts

CPU, RAM, Disk

B<>com & Zurich University

of Applied

Sciences

Uniform airflow migration strategy

Moves workload when server’s airflow is greater than specified threshold; it will also decide how to move the VMs according to the current inlet temperature and system power

airflow, inlet temperature

Intel

Workload stabilization strategy

Monitors if there is a higher load on some hosts compared to other hosts in the cluster and re-balances the work across hosts to minimize the standard deviation of the loads in a cluster

CPU, RAM

Servionica

Workload balance strategy

Makes decisions to migrate workloads to make the total VM workloads of each hypervisor balanced when the total VM workloads of hypervisor reaches threshold

CPU

Intel

VM Workload Consolidation Strategy

Leverages a modified first-fit algorithm to achieve increased server CPU and memory utilization which ultimately leads to freeing some of the hosts that can be powered down to save energy.

RAM, disk.root.size

Zurich University

of Applied

Sciences

https://blog.zhaw.ch/icclab/employing-openstack-watcher-in-geyser-to-make-openstack-more-energy-efficient/

12

Source http://blog.zhaw.ch/icclab/employing-openstack-watcher-in-geyser-to-make-openstack-more-energy-efficient

Watcher Scoring Module and Trusted Analytics Platform (TAP) Integration

  • The Watcher scoring module is a generic machine learning service which standardize interactions with scoring engines through the common API
    • An additional Watcher component
    • A pluggable system similar to the Watcher decision engine
      • Add new scoring engines to the system in a similar way as it is with strategies
  • TAP [1] plugin for Watcher:
    • Scoring module plugin will be owned by Intel and hosted in TAP Github repositories
    • The plugin serves as a kind of proxy to TAP scoring engines
    • TAP scoring engines can be run in the cloud or locally, but in both cases HTTP is used for communication

13

https://blueprints.launchpad.net/watcher/+spec/scoring-module

[1] http://trustedanalytics.org/

Watcher and Related Ecosystem

14

Resource Pool

Compute

Storage

Networking

Metric Collections

Time series DB (e.g., InfluxDB)

Monasca

Ceilometer

Infrastructure Attributes

Power Perf Sec Temp Util Location

Offline Data Analytics

App. A

App. B

App. C

Nova

Cinder

Neutron

Admin control

Horizon

CLI

OpenStack

Scoring engines

Analytic tool

Trusted Analytic

Platform (TAP)

Resource Manager

Planner

Watcher

Decision engine

TAP plug-in

Real time data access

for continuous audit

Scoring module

Offline Data Analytics and model creation and training

http://trustedanalytics.org/

Plans for the Pike Release

  • Integrate Watcher Data Model with Cinder
  • Add audit tag to VM metadata to let external systems know that VM is in optimization process
  • Provide Gnocchi support as Data Source for strategies
  • Add workload characterization to improve cloud optimization
  • Use notifications in Watcher (event-driven fashion)
  • Provide more “value-added” optimization strategies

15

Strategies in plan for the Pike Release

16

Strategy

Descriptions

Telemetry used

Noisy Neighbor Strategy

L3 cache is critical and limit system level resource shared by all apps or VMs on one node. If one VM occupies most of L3 cache, other VMs on the node likely starve without enough L3 cache thus poor performance. This BP adds a new strategy to detect then migrate such cache greedy VM based on some new cache/memory metrics.

perf.instructions,

perf.cpu.cycles,

cpu_l3_cache

Intel

Strategy to trigger "power on" and "power off" actions

For a data center with large amount of VMs and physical hosts,the total power consumption is tremendous.

When workload is not heavy, Watcher can be used to reduce power consumption by triggering a request to power off some idle hosts without VMs.

And when the workload increases watcher will trigger a "power on" request to fulfill the service requirements.

Not specified yet

ZTE

Any Questions?

  • Want to learn more?!?
    • Wiki : https://wiki.openstack.org/wiki/Watcher
    • IRC : #openstack-watcher
  • If you are interested, we would love for you to get involved – come and see us!

17

Legal Notices and Disclaimers

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.  For more complete information about performance and benchmark results, visit http://www.intel.com/performance.    

Intel, the Intel logo and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

Watcher_OpenStack_Boston_2017.pptx - Google Slides