Classification�(Advanced Methods)
1
Classification: Advanced Methods
2
Bayesian Belief Networks
3
X
Y
Z
P
Bayesian Belief Network: An Example
4
Family
History (FH)
LungCancer
(LC)
PositiveXRay
Smoker (S)
Emphysema
Dyspnea
LC
~LC
(FH, S)
(FH, ~S)
(~FH, S)
(~FH, ~S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Network
CPT: Conditional Probability Table for variable LungCancer:
shows the conditional probability for each possible combination of its parents
Derivation of the probability of a particular combination of values of X, from CPT:
Training Bayesian Networks: Several Scenarios
5
Classification: Advanced Methods
6
Classification by Backpropagation
7
Neural Network as a Classifier
8
A Multi-Layer Feed-Forward Neural Network
9
Output layer
Input layer
Hidden layer
Output vector
Input vector: X
wij
How A Multi-Layer Neural Network Works
10
Defining a Network Topology
11
Backpropagation
12
Neuron: A Hidden/Output Layer Unit
13
μk
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
∑
w0
w1
wn
x0
x1
xn
bias
Efficiency and Interpretability
14
Classification: Advanced Methods
15
Classification: A Mathematical Mapping
16
x
x
x
x
x
x
x
x
x
x
o
o
o
o
o
o
o
o
o
o
o
o
o
Discriminative Classifiers
17
SVM—Support Vector Machines
18
SVM—History and Applications
19
SVM—General Philosophy
20
Support Vectors
Small Margin
Large Margin
SVM—Margins and Support Vectors
*
Data Mining: Concepts and Techniques
21
SVM—When Data Is Linearly Separable
22
m
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (MMH)
SVM—Linearly Separable
23
W ● X + b = 0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
w0 + w1 x1 + w2 x2 = 0
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
Why Is SVM Effective on High Dimensional Data?
24
SVM—Linearly Inseparable
25
SVM: Different Kernel functions
26
Scaling SVM by Hierarchical Micro-Clustering
27
CF-Tree: Hierarchical Micro-cluster
28
Selective Declustering: Ensure High Accuracy
29
CB-SVM Algorithm: Outline
30
Accuracy and Scalability on Synthetic Dataset
31
SVM vs. Neural Network
32
SVM Related Links
33
Chapter 9. Classification: Advanced Methods
34
Associative Classification
P1 ^ p2 … ^ pl 🡪 “Aclass = C” (conf, sup)
35
Typical Associative Classification Methods
36
Frequent Pattern-Based Classification
37
Frequent Pattern vs. Single Feature
38
(a) Austral
(c) Sonar
(b) Cleve
Fig. 1. Information Gain vs. Pattern Length
The discriminative power of some frequent patterns is higher than that of single features.
Empirical Results
39
(a) Austral
(c) Sonar
(b) Breast
Fig. 2. Information Gain vs. Pattern Frequency
Feature Selection
40
Experimental Results
41
41
Scalability Tests
42
DDPMine: Branch-and-Bound Search
43
Association between information gain and frequency
a
b
a: constant, a parent node
b: variable, a descendent
DDPMine Efficiency: Runtime
44
PatClass
Harmony
DDPMine
PatClass: ICDE’07 Pattern Classification Alg.
Chapter 9. Classification: Advanced Methods
45
Lazy vs. Eager Learning
46
Lazy Learner: Instance-Based Methods
47
The k-Nearest Neighbor Algorithm
48
.
_
+
_
xq
+
_
_
+
_
_
+
.
.
.
.
.
Discussion on the k-NN Algorithm
49
Case-Based Reasoning (CBR)
50
Chapter 9. Classification: Advanced Methods
51
Genetic Algorithms (GA)
52
Rough Set Approach
53
Fuzzy Set Approaches
54
Chapter 9. Classification: Advanced Methods
55
Multiclass Classification
56
Error-Correcting Codes for Multiclass Classification
57
Class | Error-Corr. Codeword | ||||||
C1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
C2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
C3 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
C4 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
Semi-Supervised Classification
58
Active Learning
59
Transfer Learning: Conceptual Framework
60
Traditional Learning Framework
Transfer Learning Framework
Transfer Learning: Methods and Applications
61
Chapter 9. Classification: Advanced Methods
62
Summary
63
References
64
Surplus Slides
What Is Prediction?
66
Linear Regression
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
67
Nonlinear Regression
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
68
Other Regression-Based Models
69
Regression Trees and Model Trees
70
Predictive Modeling in Multidimensional Databases
71
Prediction: Numerical Data
72
Prediction: Categorical Data
73
SVM—Introductory Literature
74
Notes about SVM—�Introductory Literature
75
Associative Classification Can Achieve High Accuracy and Efficiency (Cong et al. SIGMOD05)
76
A Closer Look at CMAR
77
Perceptron & Winnow
78
Input: {(x1, y1), …}
Output: classification function f(x)
f(xi) > 0 for yi = +1
f(xi) < 0 for yi = -1
f(x) => wx + b = 0
or w1x1+w2x2+b = 0
x1
x2