Local Interpretable Model Agnostic Explanations (LIME)
and Discussion about other Heatmap Methods
Features vs Interpretable Representations
Interpretable explanations need to use a representation that is understable to humans
Ex - In text that representation would be binary vector indicating presence or absence of a word vs word embeddings
Ex - In image it can be a binary vector indicating presence/absence of a super-pixel (vs actual pixel values)
�
Some Notations
x - original representation of an instance being explained
x’ - binary vector for its interpretable representation
Explanation as a model g coming from a class G of interpretable models
Domain of g is binary (presence/absence of interpretable components)
Ω(g) - measure of complexity of model
f - model being explained
Πx - proximity measure between z and x (local point around x)
L - measure of unfaithfulness of g in approximating f in the local neighbourhood
General Equation of their formulation
Our task is to find a g which is interpretable and faithfully represents f in local neighbourhood
explanation(x)=argmin(g∈G) L(f,g,πx)+Ω(g)
A
Few Results
Global Reliability of models
Explanation of a single prediction provides some understanding into the reliability of the classifier
Still not sufficient to evaluate and assess trust in the model as a whole
Global understanding/reliability of the model by explaining a set of individual instances (carefully selected by their provided method)
Problems
Ref on LIME
Some of the other heamap methods
Occlusion
Gradient based Heatmaps
What we are doing
We have a model which would classify the image based on the centre pixel regardless of the content of the image
Ideal Heatmap
How would the ideal model explanation look like?
It would be a binary image with 1 at centre and 0 everywhere representing that my centre pixel is the only important pixel for the classification
Applying occlusion for our model
Perfect result when applying occlusion with a 1x1 patch
Applying other heatmap methods?
Applying other heatmap methods?
Our model
f(x) = Softmax(Wx)
where x is your flattened input image.