1 of 28

Reliability Dashboards

January, 2019

Presentation Name

Month 00, 2018

1

Copyright © 2018 OverOps. All rights reserved.

2 of 28

Anomaly Detection & Root Cause for QA, DevOps and SREs.

OverOps Reliability Dashboards

  • An open-source project that enables integration of OverOps data via REST APIs.
  • Machine Learning detect anomalies for apps, deployments and infrastructure tiers across environments.
  • A set of Grafana dashboards provide visualization and advanced drill down capabilities.
  • A Jenkins plugin provides reliability report and active quality gates in CI/CD.
  • All provided widgets can be integrated into any existing operational dashboard for data correlation.

2

Copyright © 2018 OverOps. All rights reserved.

3 of 28

The Reliability Scorecard dashboard provides an overview of all anomalies detected within a target deployment, application or infrastructure tier within the selected environment(s), assigning each its own dynamic score.

Anomalies include:

* Newly introduced errors.

* Regressed / increasing errors.

* Performance slowdowns.

Each anomaly is assigned a Severity, and can be drilled into to see its Root cause.

Learn more about scoring here.

3

Copyright © 2018 OverOps. All rights reserved.

4 of 28

4

Copyright © 2018 OverOps. All rights reserved.

5 of 28

The Deployments pane shows the state + score of each active deployment within the selected environment(s).

5

Copyright © 2018 OverOps. All rights reserved.

6 of 28

Hovering over New errors will show their code locations. Severe errors are highlighted. Click an error to drill into its Root Cause.

6

Copyright © 2018 OverOps. All rights reserved.

7 of 28

The New errors drill-down shows all new errors new to a deployment, application or infrastructure tier. Click an error to see its Root cause.

7

Copyright © 2018 OverOps. All rights reserved.

8 of 28

Hovering over Increasing Errors will show their origin and rate of increase(%). Severe regressions are highlighted. Clicking an error drills into its Root cause.

8

Copyright © 2018 OverOps. All rights reserved.

9 of 28

Hovering over new errors will show their locations. Severe (P1) errors are highlighted. Clicking an error will jump to its root cause analysis.

The Increasing Errors drill-down shows the increase in volume and percentage of the regression, comparing it to a dynamic baseline. Clicking an error will show its root cause.

9

Copyright © 2018 OverOps. All rights reserved.

10 of 28

The Root Cause drilldown shows the state of the code at the moment of regression across the entire call stack - 10 levels into the heap.

Time of Regression

10

Copyright © 2018 OverOps. All rights reserved.

11 of 28

Hovering over Slowdowns shows their rate change. Severe slowdowns are highlighted. Clicking a slowdown drills into its Root cause.

11

Copyright © 2018 OverOps. All rights reserved.

12 of 28

The Slowdowns drill-down shows the increase in response time for each slowing transaction compared to a dynamic baseline. Click to see the Root cause of the slowdown.

12

Copyright © 2018 OverOps. All rights reserved.

13 of 28

Auto Timers show the state of the code at the moment of slowdown across the entire call stack - 10 levels into the heap.

Location of slowdown highlighted.

13

Copyright © 2018 OverOps. All rights reserved.

14 of 28

Also captured are the full environment state of the container or instance, and last 250 lines of DEBUG-level log statements.

14

Copyright © 2018 OverOps. All rights reserved.

15 of 28

The Tiers pane shows the score of each Infrastructure Tier used within the environment(s). Tiers are automatically identified via dynamic code analysis.

15

Copyright © 2018 OverOps. All rights reserved.

16 of 28

The Applications pane shows the score of each microservice or monolith. Click each one to drill down to Reliability Analysis.

16

Copyright © 2018 OverOps. All rights reserved.

17 of 28

Reliability Analysis shows Anomalies in a target app, deployment or infrastructure tier. Anomalies are prioritized based on volume, regression and origin in code. Clicking a regression or slowdown shows its Root cause.

17

Copyright © 2018 OverOps. All rights reserved.

18 of 28

18

Copyright © 2018 OverOps. All rights reserved.

19 of 28

The Performance dashboard shows the state + score of every transaction within the system. Clicking a transaction will show the Root cause of its slowdown.

19

Copyright © 2018 OverOps. All rights reserved.

20 of 28

20

Copyright © 2018 OverOps. All rights reserved.

21 of 28

Each Transaction shows the error occuring within it, including the volume and rate each error. Click to drill-down.

21

Copyright © 2018 OverOps. All rights reserved.

22 of 28

The Errors drill-down shows errors within target transaction(s). If a Jira issue has been opened, it can be accessed directly. Click an error for its Root Cause.

22

Copyright © 2018 OverOps. All rights reserved.

23 of 28

The Root Cause drill-down shows the code and variable state at the moment of error across the entire call stack - 10 levels into the heap, as well environment state and DEBUG-level statements.

23

Copyright © 2018 OverOps. All rights reserved.

24 of 28

The Event Diff dashboard enables QA and DevOps/SRE teams to compare between different releases.

24

Copyright © 2018 OverOps. All rights reserved.

25 of 28

As well as different time periods.

25

Copyright © 2018 OverOps. All rights reserved.

26 of 28

Promotion Gates in Jenkins apply Machine Learning to extend code coverage by automatically marking a build as unstable.

OverOps provides four code Quality Gates for CI/CD environments, based on:

  1. Error Volume
  2. Unique Error Count
  3. Critical New Issues
  4. Critical Regressions

26

Copyright © 2018 OverOps. All rights reserved.

27 of 28

The Jenkins plugin integrates into your CI/CD environment to report on new issues and regressions during the test and integration phase.

27

Copyright © 2018 OverOps. All rights reserved.

28 of 28

Thank you

Presentation Name

Month 00, 2018

28

Copyright © 2018 OverOps. All rights reserved.