1 of 35

Context Sight: Model Understanding and Debugging via Interpretable Context

Jun Yuan

New York University

Enrico Bertini

Northeastern University

HILDA Paper Mentor: Minsuk Kahng

Oregon State University

2 of 35

Why did the model make such “guess”?

What did I draw?

90%

Pear

AI model gives a guess

88%

Onion

80%

Potato

70%

Avocado

3 of 35

Closest match Pear

Closest match Onion

Closest match Potato

*Example inspired by:

Cai, C.J., Jongejan, J. and Holbrook, J., 2019, March. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th international conference on intelligent user interfaces (pp. 258-262).

4 of 35

Example-based Explanations!

5 of 35

Background: Using Examples to Explain Model Predictions

Prototype and Criticisms [1] are examples showing what the model has learned and not captured.

[1] Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. “Examples are not enough, learn to criticize! Criticism for interpretability.” Advances in Neural Information Processing Systems (2016).

[2] Verma, S., Dickerson, J. and Hines, K., 2020. Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596.

[3] Peterson, L.E., 2009. K-nearest neighbor. Scholarpedia, 4(2), p.1883.

Counterfactual examples [2] demonstrate how an instance has to be minimally changed to significantly change its prediction.

Nearest-neighbors [3] tell us how the model makes predictions on similar instances.

Other example related explanations

6 of 35

Examples Are Useful for Model Understanding

[Cai et al., 2019]

Examples can serve as explanations for algorithmic behaviors for laypersons.

[Bove et al., 2022]

Contextualization improves understanding of feature importance from non-expert users.

7 of 35

However, there is a lack of systematic investigation to answer

1. What factors are taken into consideration when using examples for model understanding and debugging?

2. How does examples help with model understanding and debugging in practice?

8 of 35

Understanding Model Behaviors via Context

We define the Context of an instance as:

A set of instances (examples) selected or generated according to certain criteria, in order to understand how the model makes the prediction on this instance.

9 of 35

Our Contribution

A literature review of existing methods that involves context, which derives a taxonomy of Interpretable Context.

A visual analytics system, Context Sight, which adapts the key elements from Interpretable Context, to help with a specific task of model understanding and debugging.

10 of 35

Literature Review

Answer “What factors are taken into consideration when using examples for model understanding and debugging?”

Step 1

11 of 35

Context Usage: A Literature Review

We conducted initial analysis of a collection of 20 papers involving the usage of context to understand and debug an ML model.

12 of 35

Analysis Result: Interpretable Context

Interpretable Context

Context Generation

Context Summarization

Similarity

Model Output

Low-level: instance

Mid-level: distribution

High-level: auto summary

Source

Data-driven Similarity

Model-driven Similarity

Original Prediction

Desired Prediction (e.g., counterfactuals)

All Prediction (e.g., nearest-neighbors)

Existing Data

Generated Data

13 of 35

Analysis Result: Interpretable Context

Interpretable Context

Context Generation

Context Summarization

Similarity

Model Output

Source

Data-driven

Model-driven

E.g., similar in terms of RGB values

E.g., similar in terms of the features in the last layer of CNN.

14 of 35

Analysis Result: Interpretable Context

Interpretable Context

Context Generation

Context Summarization

Similarity

Source

Model Output

Same Prediction

Different Prediction (e.g., counterfactuals)

All Prediction (e.g., nearest-neighbors)

15 of 35

Analysis Result: Interpretable Context

Source

Existing Data

Generated Data

Interpretable Context

Context Generation

Context Summarization

Similarity

Model Output

16 of 35

Analysis Result: Interpretable Context

Interpretable Context

Context Generation

Context Summarization

Low-level: e.g., instance

Mid-level:

e.g, distribution

High-level:

e.g., summary

“People who are under 30 years old and whose BMI is under 35 will be predicted healthy by the diabetes prediction model.”

17 of 35

For a specific task, how can interpretable context help?

We use Model Debugging as an example.

18 of 35

Prototyping

Design a prototype to support a specific task based on context.

Step 2

Literature Review

Answer “What factors are taken into consideration when using examples for model understanding and debugging?”

Step 1

19 of 35

From Taxonomy to Design Goals of Model Debugging

G1: Customize parameters to find neighbors of an instance from training data.

(to check how the model learn from similar cases)

Context Generation

G2: Inspect counterfactuals generated with desired properties.

(to check what the model assumes as the desired class)

20 of 35

From Taxonomy to Design Goals of Model Debugging

G3: Enable users to inspect the visualization the selected instance and its context to visually capture the pattern of the context.

Context Summarization

G4: Provide auto-generated summary to guide users to interpret the context

21 of 35

Context Generation

Nearest neighbors

Heterogeneous Euclidean-Overlap Metric (HEOM) for a mix of categorical and continuous data

Counterfactual Generation

DiCE

22 of 35

Context Sight: Prototype User Interface

23 of 35

Usage Scenario

Data: Home Equity Line of Credit (HELOC) Dataset

(FICO xML Challenge)

Model: Multi-layer Perceptron, accuracy: 72.67%

24 of 35

Context Sight

Select an instance predicted as Default, but should be Not Default.

25 of 35

Context Generation

Setting desired properties to search nearest neighbors (G1).

Setting desired properties to generate counterfactuals (G2).

27 of 35

Context Visualization (G3): Data Table

28 of 35

Context Visualization (G3) : Parallel Coordinates

The buttons control what to be shown in the parallel coordinates.

29 of 35

Context Visualization (G3): Feature

Each feature is represented by 3 axes

% Trades w/ Balance

Example: A loan applicant has % Trades w/ Balance = 80, the applicant is predicted as Will Default, but should be Will Not Default.

1st Axis: The feature value in the original instance, and that in the counterfactual.

The model seems learned a too low threshold for % Trades with Balance for applications will not default.

30 of 35

Context Visualization (G3): Feature

% Trades w/ Balance

2nd + 3rd Axis: Context examples with the predicted class, and those with the desired class (ground truth).

Scatterplot

Histogram

The context of this error predicted instances seems have more Default cases around 80, but less Not Default around 80.

This may be relevant to why the error happens.

31 of 35

Auto Summarization of Context (G4)

The auto summary is generated based on the entropy of the predicted classes in the subgroup.

→ Where the model makes consistent predictions

→ Guide users to check these feature ranges to reason about the error (G4)

32 of 35

Conclusion

Design and implement Context Sight, a prototype of using interpretable context to understand and debug classification models

A taxonomy of interpretable context, including context generation and context summarization

33 of 35

Future Work

We plan to conduct an observational study based on Context Sight.

Observational Study

Observe how practitioners use context to understand and debug a model.

Step 3

Prototyping

Design a prototype to support a specific task based on context.

Step 2

Literature Review

Answer “What factors are taken into consideration when using examples for model understanding and debugging?”

Step 1

It remains unknown how practitioners actually use context to understand and debug a model in practice.

34 of 35

Future Work

Specifically, we use Context Sight as a probe to understand the following research questions:

RQ1: How do context examples and summaries help reach the goal of model understanding and debugging?

RQ2: What role does interaction play when using context? Do practitioners have different workflow of using context?

1 of 35

2 of 35

3 of 35

4 of 35

5 of 35

6 of 35

7 of 35

8 of 35

9 of 35

10 of 35

11 of 35

12 of 35

13 of 35

14 of 35

15 of 35

16 of 35

17 of 35

18 of 35

19 of 35

20 of 35

21 of 35

22 of 35

23 of 35

24 of 35

25 of 35

26 of 35

27 of 35

28 of 35

29 of 35

30 of 35

31 of 35

32 of 35

33 of 35

34 of 35

35 of 35