1 of 1

Interpretable Model Drift Detection�Pranoy Panda¹ Sai Srinivas Kancheti¹ Vineeth N Balasubramanian¹ Gaurav Sinha²

¹Indian Institute of Technology Hyderabad ²Microsoft Research

To quantify interpretability, we introduce an occlusion based metric.

denotes the drop in accuracy due to a drift. A feature is important to a drift, when it explains away the drop in accuracy.

A larger value indicates that the most important features are occluded

Qualitative case study on USENET2 when concept drifts from space to medicine. Highlighted cells are words related to drift as given by each method.

Datasets: 10 synthetic and 5 real-world datasets. They include drifts caused due to covariate-shift, posterior-shift and a mixture of both.

Baselines: 6 well known drift detection methods. Marginal and Conditional (in gray) are interpretable but only look at covariate shift. The remaining do not offer feature level interpretability

Metrics (higher is better): Precision and Recall

are computed for synthetic datasets.

M is average model performance computed

over the data stream (accuracy for classfn and

R-square value for regression)

Motivation

Distribution of data in the real world is non stationary, which causes ML models to degrade over time. However not all data distribution shifts result in performance degradation.

Our method TRIPODD detects model drift and explains it in terms of input features that are most responsible for it.

Comparison with existing methods

To the best of our knowledge TRIPODD is the first effort towards feature-interpretable model drift detection. Our method uses only model risk and can be applied to both classification and regression.

Experiments

Interpretability Study

Model Drift: Given a stream of samples

we say a feature sensitive model drift occurs at time t=T if there exist distributions

and a model h trained on data from p, such that

There is at least one feature for which the change of subset-specific risk differs across p and q

The impact of a feature is measure by the change in subset specific risk. We say that there is a feature-sensitive model drift when the impact of some feature changes.

Hypothesis test: The test for feature k has the following null and alternate��

The test for all features can be carried out in parallel. We signal a drift iff H0 is rejected for some feature.

Let

We can see that d>0 iff the alternate hypothesis is true. We thus define our test statistic as follows

where the hat indicates sample estimate for sample size n

Algorithm: Our algorithm TRIPODD uses a sliding window procedure. To check for drift at time t, a “reference” window of size n and a “new samples” window of size n are considered around t.

The model is trained on first ⌊nr⌋ samples and n-⌊nr⌋ samples are used to compute the test statistic.

If no drift is found at t, new samples window is shifted by 𝛿. If a drift is detected the model is retrained on the new window.