1 of 7

Statistical Analysis of Pump Failure

Analysis by Allison Fultz

1

2 of 7

Project Overview

Problem:

Understand what variables may drive an asset failure in Southern Water Corporation’s water plant pumps

Goals:

  1. Identify a list of variables that may provide an indication of when the pump may be failing
  2. Come up with a prototype linear equation that can be used to ‘describe’ what variables are closely related to pump failure.

Process Using Excel for Analysis:

  1. Created time-series line plots of the Raw, Rolling Mean, and Rolling Standard Deviation datasets for all variables, and identified trends.
  2. Created two box plots for each dataset, one with data from when the pump has failed, and one with data from when the pump has not failed, and identified trends.
  3. Created correlation heat-maps for each dataset, and identified Pump Failure coefficients, then visualized in barcharts.
  4. Created a multivariate linear regression, interpreted R Squared and P Values.
  5. Created Multivariate Regression Equation values for each row of Rolling Standard Deviation data using the regression coefficients calculated, which provided me with the Statistical Alarm Signal that the Engineers will observe in the future for abnormalities.
  6. Created a bar chart that showed each variable’s Regression Coefficient to identify which variables can be used to predict Pump Failure.

Product:

Statistical Alarm Signal to Predict Pump Failure

2

3 of 7

Descriptive and inferential statistical methodologies have proven effective in creating a proactive ‘alarm’, accurately identifying Pump Failures with Horse Power (HP) and Pump Efficiency (PE) emerging as key variables of interest with deviations of 15 HP and > 3 % PE being our core signal thresholds.

3

4 of 7

Descriptive Analysis has enabled us to clearly identify particular signature abnormalities showing clear signature changes in both Rolling Standard Deviation and Rolling Mean Datasets when observed over the respective failure period of interest.

4

5 of 7

Further segmentation of the data via binary means (Pump Failure = 0 or 1) illustrated through Box Plots, show a clear signature difference between that of normal behaviour and that of Failure with Pump Torque, Pump Speed, and Pump Efficiency showing the 3 largest variances.

5

6 of 7

Correlation analyses across datasets yield interesting insights with Pump Efficiency, Volumetric Flow Meter 1, and Volumetric Flow Meter 2 negatively correlated with Pump Failure in the Rolling Mean Data, whilst Horse Power, Pump Speed, and Pump Torque show a subsequently strong positive correlation in the Rolling Stdev Dataset.

6

7 of 7

Lastly, analysis of the model fit reveals that with a R Squared of .78, a linear model is a good fit for the data with variables Horse Power and Pump Efficiency having the largest coefficients, indicative that these variables have the most immediate relationship with respect to Pump Failure behaviour.

7