1 of 29

FOLD-SE (Scalable Explainable AI)

Huaduo Wang and Gopal Gupta

Computer Science Department

The University of Texas at Dallas, Richardson, USA

1

Presented by:�Hrushikesh Kothikar

2 of 29

2

Fold-SE algorithm is an improvement over FOLD-R++ Algorithm.

Provides scalable explainability (SE) while inheriting all the merits of FOLD-R++.

The FOLD-SE algorithm outperforms FOLD-R++ and RIPPER algorithms in efficiency, performance, and explainability, especially for large datasets.

FOLD-SE: Origination

3 of 29

FOLD-R++

  • An improvement over FOLD-R algorithm which is an automated inductive learning algorithm for learning default rules for mixed data.

  • FOLD-R is a top-down algorithm that works by learning default rules with exceptions.

  • It does so by first learning the default predicate that covers positive examples while avoiding negative examples, then next it swaps the positive and negative examples and calls itself recursively to learn the exception to the default.
  • FOLD-R++ improves upon FOLD-R without compromising or losing information in the input training data during the encoding or feature selection phase. It generates an explainable model.

3

4 of 29

FOLD-R++ : Specific Improvements over FOLD-R

  • Utilizes prefix sum technique (discussed later) to optimize the process of calculation of information gain.

  • Improves FOLD-R algorithm by allowing negated literals in the default portion of the learned rules.

  • A new hyper-parameter, called exception ratio, helps improve efficiency and classification performance.

4

5 of 29

FOLD-R++ : Inductive Logic Programming (Background)

It is a subfield of machine learning that learns models in the form of logic programming rules. A more formal definition:

5

6 of 29

FOLD-R++ : Default Rules (Background)

6

7 of 29

FOLD-R++ : Algorithm

  • The output of the FOLD-R++ algorithm is a set of default rules coded as a normal logic program.
  • An example implied by any rule would be classified as “positive”.
  • After learning a new rule, the already covered positive examples are ruled out.
  • To learn any new rule, the best literal (feature) is selected and added to the default part of the rule’s body based on information gain using the remaining training examples.
  • Next, only the examples that can be covered by learned default literals would be used for further learning (specializing) of the current rule.

7

Detailed Algorithm (Page 8, Algorithm 2): https://arxiv.org/pdf/2110.07843.pdf

8 of 29

FOLD-R++ : Algorithm (Contd.)

  • FOLD-R++ next learns exceptions after first learning default literals by swapping the residual positive and negative examples and calling itself recursively.

  • The remaining positive and negative examples can be swapped again and exceptions to exceptions learned. Interpretability increases due to fewer rules and literals being generated.

  • For the Adult Census Income dataset, for example, without the hyper-parameter exception ratio (equivalent to setting the ratio to 0), the FOLD-R++ algorithm would take around 10 minutes to finish the training and generate hundreds of rules. With the ratio parameter set to 0.5, only 13 rules are generated in around 10 seconds.

  • FOLD-R++ allows for negated literals in the default part of the generated rules.

8

9 of 29

FOLD-R++ : Literal Selection

  • Only equality (and inequality) signs generated for categorical features.
  • Additional numerical comparison (≤ and >) literal candidates would be generated for numerical features. Mixed type treated as Numerical.
  • Only one round of classification is needed for a single feature, even with mixed types of values.
  • Two types of literals would be generated: equality comparison literals and numerical comparison literals.

9

10 of 29

FOLD-R++ : Literal Selection (IG Calculation)

10

11 of 29

FOLD-R++ : Literal Selection (Contd.)

  • Numerical comparisons (≤ and >) between a numerical value and a categorical value is always false.�
  • Given E+ = {1, 2, 3, 3, 5, 6, 6, b}, E− = {2, 4, 6, 7, a}, and literal(i, ≤, 3), the true positive example Etp, false negative examples Efn, true negative examples Etn, and false positive examples Efp implied by the literal are {1, 2, 3, 3}, {5, 6, 6, b}, {4, 6, 7, a}, {2} respectively. �
  • Then, the information gain of literal(i, ≤, 3) is calculated IG(i,≤,3)(4, 4, 4, 1) = −0.619 using the algorithm in the previous slide.

11

12 of 29

FOLD-R++ : Literal Selection (Contd.)

12

13 of 29

FOLD-R++ : Literal Selection (Contd.)

13

14 of 29

FOLD-R++ : Explainability

  • To efficiently justify the prediction, the FOLD-R++ outputs normal logic programs that are compatible with the s(CASP) goal-directed answer set programming system.

  • The s(CASP) system executes answer set programs in a goal-directed manner. (Stratified normal logic programs output by FOLD-R++ are a special case of answer set programs.)

14

15 of 29

15

Accuracy

0.86

Precision

0.88

Recall

0.95

F1 Score

0.91

16 of 29

16

Justification Tree produced by s(CASP) for a query about ID 30 having income less than 50k

17 of 29

17

Accuracy

0.61

Precision

0.98

Recall

0.50

F1 Score

0.66

The RIPPER System� (rule-induction algorithm that generates formulas in conjunctive normal form (CNF) as an explanation of the model.)

18 of 29

FOLD-SE

Scalable Explainable AI

18

19 of 29

FOLD-SE

  • Provides scalable explainability (SE) while inheriting all the merits of FOLD-R++.

  • In FOLD-SE the size of the rule-set is a small constant regardless of the dataset size.

  • FOLD-SE algorithm outperforms FOLD-R++ and RIPPER algorithms in efficiency, performance, and explainability, especially for large datasets.

19

20 of 29

FOLD-SE: What’s new?

  1. A newly created heuristic for literal selection that greatly reduces the number of rules and predicates in the generated rule-set.

  • A rule pruning mechanism to handle the Long-Tail Effect.�[Long-Tail Effect: Rules generated later in the learning process cover fewer examples than those generated earlier]

  • Two comparison operators that improve the literal selection process for sparse datasets containing many missing values.

20

Detailed Algorithm (Page 6, Algorithm 2): https://arxiv.org/pdf/2208.07912.pdf

21 of 29

FOLD-SE: Heuristic for Literal Selection

  • Magic Gini Impurity (MGI): Newly created heuristic.
  • Inspired by Gini Impurity (GI) heuristic to guide the literal selection process.

  • Lesser number of rules and literals without loss in performance compared to Information Gain (IG).

21

22 of 29

FOLD-SE: Heuristic for Literal Selection (Contd.)

22

23 of 29

FOLD-SE: Heuristic for Literal Selection (Performance)

23

24 of 29

FOLD-SE: Heuristic for Literal Selection (Performance)

24

25 of 29

FOLD-SE: Comparison operators

  • An extension of the comparison operator of FOLD-R++ for comparing categorical and numerical values.
  • Two numerical (resp. categorical) values are equal if they are identical, otherwise they are unequal.
  • The equality between a numerical value and a categorical value is always false.
  • The inequality between a numerical value and a categorical value is always true.
  • Numerical comparisons (≤ and >) between a numerical value and a categorical value is always false.
  • FOLD-SE extends the comparison operators with ≤ and > (as the opposites of ≤ and >) as candidate literals in the literal selection process but converted to their opposites, ≤ and >, in the final results.

25

26 of 29

FOLD-SE: Comparison operators (Contd.)

26

27 of 29

FOLD-SE: Comparison operators (Contd.)

27

28 of 29

FOLD-SE: Rule Pruning Mechanism

  • FOLD-SE introduces a hyper-parameter tail to limit the minimum number/percentage of training examples that a rule can cover.
  • It helps reduce the number of generated rules and generated literals by reducing overfitting of outliers.
  • Rules are pruned during the training process itself which helps speed up training.
  • With the tail parameter, FOLD-SE can be easily tuned to obtain a trade-off between accuracy and explainability.
  • Algorithmically, FOLD-SE is the same as FOLD-R++ with the exception of the tail parameter, the rule pruning process and the use of MGI as a heuristic instead of IG.

28

29 of 29

FOLD-SE: Performance and Results

29