X International conference�“Information Technology and Implementation” (IT&I-2023)�Kyiv, Ukraine
1
1
New Formalism for Statistical Similarity Based XAI
Dmytro Klyushin
Taras Shevchenko National University of Kyiv�
Dedicated to the tenth anniversary of the Faculty of Information Technology
Explainable Artificial Intelligence
2
Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
Shepardʼs universal law of generalization
The probability of a response to one stimulus being generalized to another is a function of a “distance” between the two stimuli in a psychological space.
3
Postulates on Machine Learning
4
2. Feature vectors of objects in the same class are closer to one another in a feature space than to feature vectors of objects in a different class (Averianov and Braverman, 1968).
Example When an Object if a Sample But Not a Point
5
Feulgen-stained DNA of nuclei from buccal epithelium of a patient
Statistical Postulates of Compactness
6
Statistical Formalism of XAI
Let X be a set of objects, Y be a set of class labels, and f: X → Y be a function attaining values at elements of a training sample drawn from X. The goal of the algorithm learning it to extent f: X → Y on all the set X to construct a solving function g: X → Y mapping all objects to their labels and minimizing a risk function. Each object in X is a sample of random values. The closeness of objects is estimated by their homogeneity rather than by distance.
7
Measure of Homogeneity (p-statistics)
8
Numerical Experiments
To estimate the true positive and true negative rates of the proposed tests, we performed numerical experiments using samples from the normal distribution of various degree of overlapping. We considered 100 samples of 40 random numbers having different averages and the same variance (location shift) and as well as 100 samples of 40 random numbers having the same average value and different variance (scale shift). We calculated the average p-statistics and its lower and upper confidence limits, the average Kolmogorov-Smirnov statistics and its p-value, and the average Wilcoxon statistics and its p-value.
9
Results of Numerical Experiments
Standard Kolmogorov-Smirnov and Wilcoxon sign rank tests work well when testing the location shift hypothesis. However, the Kolmogorov–Smirnov test fails when samples are largely overlapped in more than almost a half of the cases, and the Wilcoxon signed-rank test has failed at all. The p-statistics test fails only in a third of cases of very overlapped samples following the distributions N(0,3), N(0,4) and N(0,5).
10
Conclusions
We propose an objective approach and a universal criterion of explainability for Statistical Based XAI. It is explainable if the results of its application satisfy two statistical postulates: 1) objects can be represented by sample values of their parameters; 2) features of objects belonging to the same class have the same distributions, and the parameters of objects belonging to different classes have different distributions. As a mathematical tool for this formalism, we propose to use the p-statistics. It is universal due to high sensitivity and specificity when size of a sample is more than 40. Also, it robustly tests both hypotheses on location shift and scale shift of distribution.
11