1 of 41

From Shapley Values to Explainable AI

Kyle Vedder - GRASP Game Theory Seminar�

Video of this talk can be found at�https://www.youtube.com/watch?v=4RkhsIz14Yc

2 of 41

Background Definitions

  • Permutation - order does matter
  • Combination - order does not matter
  • Power Set - Set of all subsets of a given set
    • Every combination of a given set or its subsets

3 of 41

Introduction to Shapley Values

4 of 41

Shapley Values

  • Developed in the 1950s �by Lloyd Shapley
  • Helped win him the 2012 �Nobel Prize in Economics �along with Alvin Roth

Source: Wikipedia “Lloyd Shapley”

5 of 41

Farmer Example

  • Fixed group of farmers work together to grow wheat
  • Collaboration causes better (or worse) total yield of wheat than working individually
  • How do you assign “credit” to each farmer?

6 of 41

Farmer Example - “Credit”

  • Sum over each farmer’s credit is total wheat
    • “Efficiency”
  • Farmer who contributes nothing get none
    • “Dummy”
  • Equal contributions get equal credit
    • “Symmetry”

7 of 41

Farmer Example - “Credit” cont.

  • If two harvests involving the same farmers merge, then the joint harvest credit is the sum of the farmer’s individual harvest credits
    • “Linearity”

8 of 41

Shapley Values �are these “credit” values

9 of 41

Shapley Values �are these “credit” values

Any “credit” systems that uphold these properties must be Shapley Values!

10 of 41

Farmer Example - Permutations

  • Let F be the set of farmers {1,2,...,p}
  • Assume we have model v: Sσ
    • Input: S F, σ is ordering of S
    • Output: Real value corresponding to wheat output
      • Farmers added in the order of σ
      • Different σ might change output

11 of 41

Farmer Example - Permutations

  • To compute credit of farmer i ∈ {1,2,...p}:
    • For every permutation of farmers that aren’t i, sum the difference between wheat output with i and without i (marginal value of i)

12 of 41

Farmer Example - Permutations

Average over # permutations

Farmer i�Credit

13 of 41

Farmer Example - Permutations

#P Hard

“As hard as the counting problems associated with NP hard problems”�e.g. #SAT, exact Bayes net inference, matrix permanent

Normalize by # permutations

Farmer i�Credit

14 of 41

Glove Game Example

  • Goal is to form maximal pairs of gloves
    • P1 has L, P2 has L, P3 has R
  • v(S) = 1 if S is {1,3}, {2,3}, {1,2,3}, 0 otherwise
    • ϕ1 = ⅙
    • ϕ2 = ⅙
    • ϕ3 = ⅔

Thus:

Marginal Contribution of P1

15 of 41

Glove Game Example

  • Goal is to form maximal pairs of gloves
    • P1 has L, P2 has L, P3 has R
  • v(S) = 1 if S is {1,3}, {2,3}, {1,2,3}, 0 otherwise
    • ϕ1 = ⅙
    • ϕ2 = ⅙
    • ϕ3 = ⅔

Thus:

    • Efficiency
    • Dummy
    • Symmetry
    • Linearity

Marginal Contribution of P1

16 of 41

Farmer Example - Combinations

  • Let F be the set of farmers {1,2,...,p}
  • Assume we have model f: S
    • Input: S F
    • Output: Real value corresponding to wheat output
      • Averaged over result for every ordering of farmers
      • Hides some combinatorics
      • If order doesn’t matter, can save computation

17 of 41

Farmer Example - Combinations

  • To compute credit of farmer i {1,2,...p}:
    • For every permutation of farmers that aren’t i, sum the difference between wheat output with i and without i (marginal value of i)
    • Implement via f evaluated on every combination of farmers; let f handle the ordering

18 of 41

Farmer Example - Combinations

f evaluated with S and i, minus f evaluated with S

Powerset of Farmers without Farmer i

Combinatorial Normalization

Farmer i�Credit

19 of 41

Farmer Example - Combinations

# ways to permute succeeding farmers

# ways to permute proceeding farmers

Unordered subset (combination)

Handles ordering of S

Normalize by # permutations

20 of 41

Benefits of Shapley Values

  • Efficiency
  • Dummy
  • Symmetry
  • Linearity

21 of 41

Problems with Computing SVs

  • Computation is #P Hard
    • Very expensive in practice
  • Must be able to include/exclude farmers and get a meaningful real value
    • How do we do this for non-artificial games?

22 of 41

From Shapley Values to �SHAP Values

23 of 41

SHAP Values

  • SHapley Additive exPlanations
    • Introduced by Lundberg et. al. 2017
  • Tackles the problems of SV computation
    • Data table to bypass full definition of v
    • Data structures to speed computation
  • Extends SVs to apply to general ML

24 of 41

SHAP From Tables

F1

F2

...

Fp

Yield (y)

1

4

...

7

27

1

4

...

1

9

0

0

...

3

8

p

T =

N

25 of 41

SHAP From Tables

  • f: partial assignment of p features
    • Assignment means feature value
    • Need to handle the unassigned features

26 of 41

SHAP From Tables - Example

  • f({F1 = 1}) = ?

F1

F2

...

Fp

Yield (y)

1

4

...

7

27

1

4

...

1

9

0

0

...

3

8

27 of 41

SHAP From Tables - Nominal

  • f({F1 = 1}) = avg(T | F1 = 1, F2 = 0, …, Fp = 0)

F1

F2

...

Fp

Yield (y)

1

4

...

7

27

1

4

...

1

9

0

0

...

3

8

28 of 41

SHAP From Tables - Nominal

  • Might not have table entries
  • Not guaranteed to uphold any Shapley properties

29 of 41

SHAP From Tables - Marginal

  • f({F1 = 1}) = avg(T | F1 = 1)

F1

F2

...

Fp

Yield (y)

1

4

...

7

27

1

4

...

1

9

0

0

...

3

8

30 of 41

SHAP From Tables - Marginal

  • Proposed by Lundberg et. al. 2017
  • Impacted by sparsity
    • Conditional dist. might differ from full dist.
    • Distribution can collapse with real-world (noisy) data
  • Upholds Efficiency and Symmetry, not �Dummy and Linearity
    • Sundararajan et. al. 2019

31 of 41

SHAP From Tables - Interventional

  • f({F1 = 1}) = avg(T | do(F1 = 1))

F1

F2

...

Fp

Yield (y)

1

4

...

7

27

1

4

...

1

9

1

0

...

3

8

32 of 41

SHAP From Tables - Interventional

  • Proposed by Janzing et. al. 2019
  • do notation by Judea Pearl
    • Breaks feature correlations
    • Implies that the model is causal
  • Upholds Efficiency, Dummy, Linearity, not Symmetry

33 of 41

do notation

  • Assumes model is causal
  • Y | X1 = v �≠ �Y | do(X1 = v)

Y | X1 = v

Y | do(X1 = v)

From Janzing et. al. 2019

34 of 41

SHAP From Tables - Interventional

  • Proposed by Janzing et. al. 2019
  • do notation by Judea Pearl
    • Breaks feature correlations
    • Implies that the model is causal
  • Upholds Efficiency, Dummy, Linearity, not Symmetry

35 of 41

Applying SHAP

36 of 41

Applying SHAP

F1

F2

...

Fp

y

1

4

...

7

27

1

4

...

1

9

0

0

...

3

8

N

p

T =

37 of 41

Applying SHAP

N

p

Φ =

F1

F2

...

Fp

y

φ1,1

φ1,2

...

φ1,p

12.33

φ2,1

φ2,2

...

φ2,p

-5.66

φ3,1

φ3,2

...

φ3,p

-6.66

38 of 41

Applying SHAP

Sum of |φ| for each feature

39 of 41

Making SHAP Tractable

40 of 41

Tractability

  • Enabled more flexible definitions of f
    • Forgo some guarantees for flexible definition
  • Need to fix runtime

41 of 41

Data Structures for faster SHAP

  • TreeSHAP
    • Lundberg et. al. 2020
  • Supports Marginal
    • Impl. supports interventional
  • Open source impl.