MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths
Student: Yue Lu
Committee Chair: Eamonn Keogh
Committee Members: Tao Jiang� Tamar Shinar� Evangelos Papalexakis
I am a fifth-year Ph.D. student in the Department of Computer Science and Engineering at University of California, Riverside. My advisor is Dr. Eamonn Keogh.
Prior to beginning my doctoral studies at UC Riverside, I received my B.S. in Software Technology and Applications with First Class Honors from the Macau University of Science and Technology.
Publications
About me
Outline
Outline
Why is it worth further improving time series discords?
Limitations of single-length TSAD algorithms I
[1] A. Ermshaus, P. Schäfer, and U. Leser, “Window Size Selection in Unsupervised Time Series Analytics: A Review and Benchmark,” in AALTD, 2023, pp. 83–101.
Limitations of single-length TSAD algorithms II
8
24
0
4000
0
Battery change
4000
0
70
hours
May 18th 2016
Jan 25th 2017
Bourke Street Mall (North) Pedestrian Traffic
# of Pedestrians
Subsequence Length (hours)
Limitations of single-length TSAD algorithms II
8
24
0
4000
0
70
hours
Xmas Day
May 18th 2016
Jan 25th 2017
Bourke Street Mall (North) Pedestrian Traffic
Subsequence Length (hours)
Limitations of single-length TSAD algorithms II
8
24
0
4000
0
70
hours
Car Attack
May 18th 2016
Jan 25th 2017
Bourke Street Mall (North) Pedestrian Traffic
Subsequence Length (hours)
Limitations of single-length TSAD algorithms II
8
24
0
4000
0
Battery change
4000
0
70
hours
0
70
hours
0
70
hours
Xmas Day
Car Attack
May 18th 2016
Jan 25th 2017
Bourke Street Mall (North) Pedestrian Traffic
# of Pedestrians
Subsequence Length (hours)
Limitations of single-length TSAD algorithms III
1
1000
0
2
4
6
8
10
Discord score for m = 49 is 8.57
Discord score for m = 50 is 0.277
1
1000
0
1
2
3
4
5
Discord score for m = 49 is 4.05
Discord score for m = 48 is 0.00
(A)
(B)
Limitations of single-length TSAD algorithms III
Extra
“bump”
Training data
10
100
Value of m
54
“All” TSAD algorithms fail if m < 54
Limitations of single-length TSAD algorithms III
8
24
0
4000
0
This small blip is only anomalous when seen in the context of the normal data that surrounds it
4000
0
70
hours
May 18th 2016
Jan 25th 2017
Bourke Street Mall (North) Pedestrian Traffic
# of Pedestrians
Subsequence Length (hours)
Battery change (2 hours)
10 hours
Can we bypass the difficulties by testing all values?
Outline
MERLIN And DAMP Refined Iteratively on Demand
MERLIN And DAMP Refined Iteratively on Demand
MERLIN And DAMP Refined Iteratively on Demand
[2] Y. Lu, et al, “DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams,” Data Mining and Knowledge Discovery, pp. 1–43, 2023.
Brief Review of DAMP I
0
500
1000
1500
2000
2500
Left-aMP
(Partly computed)
Now
Training Data
Current Subsequence
Time Series
Brief Review of DAMP II
DAMP only looks back as far as it needs to
0
500
1000
1500
2000
2500
Left-aMP
(Partly computed)
Time Series
Current Subsequence
Now
Best-So-Far = 2.1
Double the length. Now look back this far.
If DAMP finds a subsequence whose distance from the current subsequence is less than Best-So-Far, stop.
DAMP find a nearest neighbor!
This process is “iterative doubling”.
DAMP can “peek” ahead to prune unqualified future subsequences
Look ahead a bit.
If there are future subsequences whose distance from the current subsequence is less than Best-So-Far, they will be pruned and not processed in the future.
0
500
1000
1500
2000
2500
3000
3500
4500
5000
0
5
10
15
Distance Profile
Best-So-Far = 2.1
Current Subsequence
Time Series
✄
✄
4000
The distance between these two subsequences and the current subsequence is less than 2.1.
This process is “forward pruning”.
MADRID is absolutely fast to compute all-discords
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is absolutely fast to compute all-discords
multi-length discord table M
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is absolutely fast to compute all-discords
Left-aMP for m=8
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is absolutely fast to compute all-discords
Left-aMP for m=8
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is absolutely fast to compute all-discords
Left-aMP for m=9
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is absolutely fast to compute all-discords
Left-aMP for m=10
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is absolutely fast to compute all-discords
meta-data
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
discord for m=8
MADRID is absolutely fast to compute all-discords
Training Data
discord for m=8
Time Series
0
1500
3000
4500
6000
7500
Loc = 7,569
m
discord score = 21.3
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
discord for m=8
MADRID is absolutely fast to compute all-discords
Training Data
discord for m=8
Time Series
0
1500
3000
4500
6000
7500
Loc = 7,569
m
discord score = 21.3
MADRID is absolutely fast to compute all-discords
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Same Loc, m+1=9
Training Data
The window size is expanded by one unit.
Time Series
0
1500
3000
4500
6000
7500
Loc = 7,569
m+1
exact discord score = 19.8
MADRID is absolutely fast to compute all-discords
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
19.8
NaN
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Same Loc, m+1=9
Training Data
The window size is expanded by one unit.
Time Series
0
1500
3000
4500
6000
7500
Loc = 7,569
m+1
exact discord score = 19.8
(initial BSF for the next row)
This value may be increased in the subsequent calculation.
MADRID is absolutely fast to compute all-discords
Quality of solution
0
60,000
0%
100%
~15.5 hours
Pure brute force (NN)
Pure brute force (extrapolated)
Wall clock time (seconds)
MADRID is absolutely fast to compute all-discords
Quality of solution
0
4000
0%
100%
Wall clock time (seconds)
Warm-start DAMP
Pure DAMP
MASS Brute Force
~One hour
~24 minutes
~16 minutes
Pure brute force (NN)
(Zoom-in)
MADRID is a Hyper-Anytime Algorithm
100%
0%
0%
100%
Computational Resource
Quality of Solution
Hyper-Anytime
Super-Anytime
Ultra-Anytime
Anytime
Hypo-Anytime
<90%
<80%
<70%
~50%
>50%
30%
MADRID is a Hyper-Anytime Algorithm
Quality of solution
0
0%
100%
Warm-start DAMP
100%
0%
0%
100%
Computational Resource
Quality of Solution
Hyper-Anytime
Super-Anytime
Ultra-Anytime
Anytime
Hypo-Anytime
<90%
<80%
<70%
~50%
>50%
30%
Wall clock time (seconds)
…
…
Discord
Loc
m
0
NaN
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
0
NaN
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
0
NaN
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is a Hyper-Anytime Algorithm
…
…
Discord
Loc
m
23.4
11,895
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
18.7
12,123
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is a Hyper-Anytime Algorithm
* Cells in gray represent those visited by DAMP, those in red highlight the top discords.
…
…
Discord
Loc
m
23.4
11,895
24
…
0
NaN
23
…
0
NaN
22
…
0
NaN
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
18.7
12,123
16
…
0
NaN
10
…
0
NaN
21
…
0
NaN
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is a Hyper-Anytime Algorithm
meta-data
Discord
Loc
m
23.4
11,895
24
…
22.5
11,895
23
…
23.7
11,895
22
…
20.8
7,569
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
18.7
12,123
16
…
19.9
7,569
10
…
19.2
12,123
21
…
18.4
12,123
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is a Hyper-Anytime Algorithm
Discord
Loc
m
23.4
11,895
24
…
22.5
11,895
23
…
23.7
11,895
22
…
1
2
3
20.8
7,569
9
…
21.3
7,569
8
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
18.7
12,123
16
…
19.9
7,569
10
…
19.2
12,123
21
…
18.4
12,123
11
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
MADRID is a Hyper-Anytime Algorithm
* The red fonts in meta-data represent exact values that match the final results and will not be modified in subsequent computations, while the black fonts are approximations, which may be corrected in subsequent calculations.
max{cell1, cell2, cell3}
MADRID is a Hyper-Anytime Algorithm
995,000
100,000
51,000
51,500
0
500
54,000
54,500
58,000
58,500
The first 500 datapoints and the last 500 datapoints of the synthetic dataset. Note the slight change in period, which is smoothly spread over the full length of the data.
The three embedded anomalies
Quality of solution
0
4000
0%
100%
Wall clock time (seconds)
Warm-start DAMP
Pure DAMP
MASS Brute Force
Pure brute force (NN)
(Zoom-in)
MADRID is a Hyper-Anytime Algorithm
MADRID is a Hyper-Anytime Algorithm
100%
0%
Hyper-Anytime
Percentage of Computational Resource
Percentage of Computational Resource
0
100%
Warm-start DAMP
Quality of solution
0%
100%
MADRID is a Hyper-Anytime Algorithm
Percentage of Computational Resource
0
100%
Warm-start DAMP
MADRID
MADRID
Percentage of Computational Resource
0
100%
Warm-start DAMP
Semantic Convergence
Semantic Convergence
100%
0%
Hyper-Anytime
Percentage of Computational Resource
100,000
Datapoints
1,000,000
Datapoints
Quality of solution
0%
100%
Outline
Comparison of MADRID with state-of-the-art TSAD methods
Method | Window Size | Precision | Train and Test Time |
OmniAnomaly | 160 | 0.05 | 5.9 hours |
Telemanom | 160 | 1 | 2.2 hours |
PAN-Matrix Profile | 64 to 256 | 1 | 1.8 days |
MERLIN | 64 to 256 | 1 | 3.6 hours |
MADRID | 64 to 256 | 1 | 3.9 minutes (34 seconds to converge) |
MADRID addresses the challenge of finding the optimal window size
0
16,716
33,432
DAMP top discord
UCR-202
Ground Truth
8,358
25,074
Left-aMP
32
DAMP’s prediction (m=32)
UCR-202
window sizes
discord score
10
96
MADRID addresses the challenge of finding the optimal window size
* Red represents successful MADRID predictions, blue indicates failed MADRID predictions, and green marks the DAMP prediction.
Ground Truth
Visualizing MADRID in 3D
MADRID can quickly detect multi-scale anomalies in real datasets
Waterfront City Pedestrian Traffic
0
10000
Apr 30th 2009
Jan 31st 2018
* Waterfront City is a shopping and entertainment area, just two kilometers from the Melbourne CBD.
MADRID can quickly detect multi-scale anomalies in real datasets
0
1200
0
100
hours
# of Pedestrians
Waterfront City Pedestrian Traffic
0
10000
Apr 30th 2009
Jan 31st 2018
Flash flooding
(15 hours)
MADRID can quickly detect multi-scale anomalies in real datasets
0
100
hours
Waterfront City Pedestrian Traffic
0
10000
Apr 30th 2009
Jan 31st 2018
AFL Parade
(19 hours)
MADRID can quickly detect multi-scale anomalies in real datasets
0
100
hours
Waterfront City Pedestrian Traffic
0
10000
Apr 30th 2009
Jan 31st 2018
Remembrance Day
(39 hours)
MADRID can quickly detect multi-scale anomalies in real datasets
Waterfront City Pedestrian Traffic
0
10000
Apr 30th 2009
Jan 31st 2018
Three years of relative humidity data for northern California
0
100
Jan 1st 2018
Dec 31st 2020
MADRID searches efficiently on millions of data
Three years of relative humidity data for northern California
0
100
Jan 1st 2018
Dec 31st 2020
0
15000
0
100
0
15000
0
100
Kinkaid Fire (one week)
Winter storm Kai (half day)
Winter storm Nadia (one day)
Thanksgiving-week storm (two days)
0
15000
0
100
0
15000
0
100
MADRID searches efficiently on millions of data
Three years of relative humidity data for northern California
0
100
Jan 1st 2018
Dec 31st 2020
MADRID searches efficiently on millions of data
Conclusion
Paper, code and demo are available at: https://sites.google.com/view/madrid-icdm-23/documentation
Dedicated to our friend
Frank Madrid
July 18, 1986 - May 18, 2020
Thank you for your attention!
Any questions?
Matrix Profile XXX: MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths
Yue Lu, Thirumalai Vinjamoor Akhil Srinivas, Takaaki Nakamura, Makoto Imamura, and Eamonn Keogh.
Appendix
1,000
10,000
1
2
3
0
0.05
Time Series Motif Length
Raw Euclidean Distance
Divide by reciprocal of square root of length normalized
1,000
10,000
1
2
3
Time Series Motif Length
Raw Euclidean Distance
Divide by length normalized
0
0.001
1
10,000
Time Series Length
We may need to compare the Euclidian distances of pair of time series, that are of different lengths.
How should we normal to compensate for the length differences?
We could:
1) Not normalize at all, but this strongly biases us to short patterns.
2) Normalize by dividing by the length of the time series. This seems to make sense, but it strongly biases us towards long patterns.
3) Normalize by the reciprocal of square root of length of the time series. Without explanation (here) we claim that this is correct.
In the figures on the bottom right, we compare the three above ideas on slightly nosily sine waves of different lengths (shown top right).
As you can see, the “Normalize by the reciprocal of square root of length of the time series” approach is invariant to the length of the patterns.
Which of these three pairs is closest?
Under mild assumptions, we might claim that they are all equally similar.
Why does this matter?
There are hundreds of papers, containing hundreds of tables, comparing different algorithms on SWaT.
In most cases, these papers report 3, 4 or even 5 significant digits!
It is all meaningless!
It is not Time Series Anomaly Detection
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
10
5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
SwaT-FIT401
…,0,0,0,0,0,…
Most domains have a special character for invalid data. For example:
But you do not need to do any anomaly detection/machine learning to detect these!
You know about these before you see any data. About 70% of the anomalies in SWaT are of this type!
This is not time series anomaly detection.
It is not Time Series Anomaly Detection
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
10
5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
SwaT-FIT401
…,0,0,0,0,0,…
Dhaval Patel
IBM TSAD expert
.. Varying signal becomes flat line..
..we can set a simple rule to detect to detect the “anomaly”
*Toolkit for Time Series Anomaly Detection. Patel et al. SIGKDD 2022
No one working in the real world would every consider this an anomaly. It is a trivial “pipeline” issue that can deal with using a “simple rule”* that you can define before you see the data.
A seventy-year-old algorithm is often superior
SwaT-FIT401
(zoom-in)
Let's bend over backwards, and pretend it is a TSDA problem.
A classic Statistical Process Control (SPC) algorithm says: “If the next value is more than three Standard Deviations away from data in a short history of points, flag as anomaly”.
This is tens of Standard Deviations, so we can solve this with a simple 70-year-old algorithm.
227900
263700
The scoring functions used greatly inflate performance
SwaT-FIT401
(zoom-in)
Finding this trivial anomaly should be scored as 1-out-of-1, failing to find it scored as 0-out-of- 1.
However, the anomaly is of length 35,800, so most papers report a score like 35,701 out of 35,800, or “Our accuracy is 0.99723”.
But these are not 35,800 independent events! This is a single event.
227900
263700
Any high precision claims are necessarily wrong
SwaT-FIT401
(zoom-in)
The original labels are almost certainly “sloppy” and “rough”
The fact that they are rounded to the nearest 100 would suggest that, however the error is greater than 100.
I have argued elsewhere, that it is essentially never possible or meaningful to have high temporal precision claims about exactly when an anomaly occurs.
227900
263700
Off by 111
Off by 3859
Others in the community are beginning to get it…
SwaT-FIT401
(zoom-in)
…Thus we conclude that evaluations on SWaT are highly unreliable and that these datasets are not suited for multivariate time-series AD evaluation.
Maja Rudolph, Bosch AI
https://ml.informatik.uni-kl.de/publications/2023/TimeSeAD.pdf
Dennis Wagner, Tobias Michels, Florian C. F. Schulz, Arjun Nair, Maja Rudolph, Marius Kloft: TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Trans. Mach. Learn. Res. 2023 (2023)
Comparison of DAMP and 5 state-of-the-art TSAD methods
Method | Accuracy | Train and Test Time |
USAD | 0.276 | 8.05 hours |
LSTM-VAE | 0.198 | 23.6 hours |
AE | 0.236 | 6.11 hours |
Telemanom | Out of memory error on longer examples | |
SCRIMP (Full-MP) | 0.416 | 24.5 minutes |
DAMP (Left-MP) out-of-the-box | 0.512 | 4.26 hours |
DAMP (Left-MP) sharpened data | 0.632 | 4.26 hours |
The reception to DAMP by the community
*OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting
Comparison of DAMP and rival methods
| DAMP | Telemanom | NORMA | SCRIMP |
Number of parameters | 1 | 16 | 2 | 1 |
Model training time | 0 | > 0 | > 0 | 0 |
Throughput | 358,000 Hz | 748 Hz | 200 Hz | 163 Hz |
Can handle batch data? | Yes | Yes | Yes | Yes |
Can handle streaming data? | Yes | Yes (however it takes too long to model) | No | No |
Can handle concept drift? | Yes | No | No | AB join SCRIMP - No Self-join SCRIMP - Yes |
Can handle multidimensional data? | Yes | Yes in principle, but it was not shown in the paper | No | No |
Is stochastic? | No | Yes | Unclear | No |