Interpretable Model Drift Detection�Pranoy Panda1 Sai Srinivas Kancheti1 Vineeth N Balasubramanian1 Gaurav Sinha2
1Indian Institute of Technology Hyderabad 2Microsoft Research
To quantify interpretability, we introduce an occlusion based metric.
denotes the drop in accuracy due to a drift. A feature is important to a drift, when it explains away the drop in accuracy.
A larger value indicates that the most important features are occluded
Qualitative case study on USENET2 when concept drifts from space to medicine. Highlighted cells are words related to drift as given by each method.
Datasets: 10 synthetic and 5 real-world datasets. They include drifts caused due to covariate-shift, posterior-shift and a mixture of both.
Baselines: 6 well known drift detection methods. Marginal and Conditional (in gray) are interpretable but only look at covariate shift. The remaining do not offer feature level interpretability
Metrics (higher is better): Precision and Recall
are computed for synthetic datasets.
M is average model performance computed
over the data stream (accuracy for classfn and
R-square value for regression)
Motivation
Distribution of data in the real world is non stationary, which causes ML models to degrade over time. However not all data distribution shifts result in performance degradation.
Our method TRIPODD detects model drift and explains it in terms of input features that are most responsible for it.
Comparison with existing methods
To the best of our knowledge TRIPODD is the first effort towards feature-interpretable model drift detection. Our method uses only model risk and can be applied to both classification and regression.
Experiments
Interpretability Study
Model Drift: Given a stream of samples
we say a feature sensitive model drift occurs at time t=T if there exist distributions
and a model h trained on data from p, such that
The impact of a feature is measure by the change in subset specific risk. We say that there is a feature-sensitive model drift when the impact of some feature changes.
Hypothesis test: The test for feature k has the following null and alternate���
The test for all features can be carried out in parallel. We signal a drift iff H0 is rejected for some feature.
Let
We can see that d>0 iff the alternate hypothesis is true. We thus define our test statistic as follows
where the hat indicates sample estimate for sample size n
Algorithm: Our algorithm TRIPODD uses a sliding window procedure. To check for drift at time t, a “reference” window of size n and a “new samples” window of size n are considered around t.
The model is trained on first ⌊nr⌋ samples and n-⌊nr⌋ samples are used to compute the test statistic.
If no drift is found at t, new samples window is shifted by 𝛿. If a drift is detected the model is retrained on the new window.