1 of 87

Heart Data: logistic estimation

1

July 17, 2023

 

Data Science for All 2023

Lecture 1: Classification

2 of 87

Heart Data: logistic estimation

2

July 17, 2023

  • There are various ways to fit a logistic model to this data set in Python. The most straightforward in sklearn is via linear_model.LogisticRegression.

Data Science for All 2023

Lecture 1: Classification

3 of 87

Using Logistic Regression for Classification

  •  

3

Data Science for All 2023

Lecture 1: Classification

4 of 87

4

Multiple Logistic Regression

Data Science for All 2023

Lecture 1: Classification

5 of 87

Multiple Logistic Regression

  • It is simple to illustrate examples in logistic regression when there is just one predictor variable.
  • But the approach ‘easily’ generalizes to the situation where there are multiple predictors.
  • A lot of the same details as linear regression apply to logistic regression. Interactions can be considered. So is overfitting. Etc...
  • So how do we correct for such problems?
  • Regularization and checking through train, test, and cross-validation!
  • We will get into the details of this, along with other extensions of logistic regression

5

Data Science for All 2023

Lecture 1: Classification

6 of 87

Classifier with two predictors

  • How can we estimate a classifier, based on logistic regression, for the following plot?

6

Data Science for All 2023

Lecture 1: Classification

7 of 87

Multiple Logistic Regression

7

 

Data Science for All 2023

Lecture 1: Classification

8 of 87

Multiple Logistic Regression

8

Data Science for All 2023

Lecture 1: Classification

9 of 87

Fitting Multiple Logistic Regression

  •  

9

Data Science for All 2023

Lecture 1: Classification

10 of 87

Interpreting Multiple Logistic Regression: an Example

10

  • Let's get back to the Heart Data. We are attempting to predict whether someone has HD based on MaxHR and whether the person is female or male. The simultaneous effect of these two predictors can be brought into one model.

  • For single logistic regression, we have the following estimated models:

Data Science for All 2023

Lecture 1: Classification

11 of 87

Interpreting Multiple Logistic Regression: an Example

  • The results for the multiple logistic regression model are:

11

Data Science for All 2023

Lecture 1: Classification

12 of 87

Classification boundaries

  •  

12

Data Science for All 2023

Lecture 1: Classification

13 of 87

Geometry of Data

13

  • Recall:
  • logistic regression for building classification boundaries works best when:
  • the classes are well-separated in the feature space
  • have a nice geometry to the classification boundary

Data Science for All 2023

Lecture 1: Classification

14 of 87

Geometry of Data

14

  • Question: How about these?

Data Science for All 2023

Lecture 1: Classification

15 of 87

Geometry of Data

15

  • Notice that in all of the datasets the classes are still well-separated in the feature space, but the decision boundaries cannot easily be described by single equations:

Data Science for All 2023

Lecture 1: Classification

16 of 87

Geometry of Data

16

 

Data Science for All 2023

Lecture 1: Classification

17 of 87

Interpretable Models

17

  • People have long been using interpretable models for differentiating between classes of objects and phenomena:

Data Science for All 2023

Lecture 1: Classification

18 of 87

18

Decision Tree

Data Science for All 2023

Lecture 1: Classification

19 of 87

Decision Trees

19

  • It turns out that the simple flow charts in our examples can be formulated as mathematical models for classification and these models have the properties we desire; they are:
  • interpretable by humans
  • have sufficiently complex decision boundaries
  • the decision boundaries are locally linear, each component of the decision boundary is simple to describe mathematically.

Data Science for All 2023

Lecture 1: Classification

20 of 87

Tree-based Models

20

  • Use trees to partition the data into different regions and make predictions

age?

overcast

student?

credit rating?

<=30

>40

no

yes

yes

yes

31..40

no

fair

excellent

yes

no

Root node

Internal nodes

Leaf nodes

Data Science for All 2023

Lecture 1: Classification

21 of 87

The Geometry of Flow Charts

  • Flow charts whose graph is a tree (connected and no cycles) represents a model called a decision tree.
  • Formally, a decision tree model is one in which the final outcome of the model is based on a series of comparisons of the values of predictors against threshold values.
  • In a graphical representation (flow chart),
    • the internal nodes of the tree represent attribute testing.
    • branching in the next level is determined by attribute value (e.g., yes/no).
    • terminal leaf nodes represent class assignments.

21

Data Science for All 2023

Lecture 1: Classification

22 of 87

The Geometry of Flow Charts

  • A path from root to a leaf node corresponds to a rule
    • E.g., if age<=30 and student=no

then target value=no

22

age?

student?

credit rating?

<=30

>40

no

yes

yes

yes

31..40

no

fair

excellent

yes

no

Data Science for All 2023

Lecture 1: Classification

23 of 87

The Geometry of Flow Charts

  • Every flow chart tree corresponds to a partition of the feature space by axis aligned lines or (hyper) planes. Conversely, every such partition can be written as a flow chart tree.

23

Data Science for All 2023

Lecture 1: Classification

24 of 87

The Geometry of Flow Charts

  • Each comparison and branching represents splitting a region in the feature space on a single feature. Typically, at each iteration, we split once along one dimension (one predictor). Why?

24

Data Science for All 2023

Lecture 1: Classification

25 of 87

The Geometry of Flow Charts

  • Each comparison and branching represents splitting a region in the feature space on a single feature. Typically, at each iteration, we split once along one dimension (one predictor).

25

Data Science for All 2023

Lecture 1: Classification

26 of 87

Learning the Model

  • Given a training set, learning a decision tree model for binary classification means:
  • producing an optimal partition of the feature space with axis-aligned linear boundaries (very interpretable!),
  • each region is predicted to have a class label based on the largest class of the training points in that region when performing prediction.

26

Data Science for All 2023

Lecture 1: Classification

27 of 87

Learning the Model

  • Learning the smallest ‘optimal’ decision tree for any given set of data is very hard. Instead, we will seek a reasonably model using a greedy algorithm.
  • Start with an empty decision tree (undivided feature space)
  • Choose the ‘optimal’ predictor on which to split and choose the ‘optimal’ threshold value for splitting.
  • Recurse on each new node until stopping condition is met
  • Now, we need only define our splitting criterion and stopping condition.

27

Data Science for All 2023

Lecture 1: Classification

28 of 87

The key is to partition data into smaller regions

  • Numerical variable
    • Use threshold to partition data
  • Categorical variable
    • Use the possible value to partition the data

28

Data Science for All 2023

Lecture 1: Classification

29 of 87

Guideline of Splitting

  • While there is no ‘correct’ way to define an optimal split, there are some common sensical guidelines for every splitting criterion:
    • the regions in the feature space should grow progressively more pure with the number of splits. That is, we should see each region ‘specialize’ towards a single class.
    • we shouldn’t end up with empty regions - regions containing no training points.

29

Data Science for All 2023

Lecture 1: Classification

30 of 87

Example

30

Data Science for All 2023

Lecture 1: Classification

31 of 87

Which attribute to choose?

31

Ages

Yes

Yes

No

No

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

No

<=30

31…40

>40

VS.

Credit_Rating

Yes

Yes

Yes

No

No

No

Yes

Yes

Yes

Yes

Yes

YesNo

No

Excellent

Fair

Q: Which attribute is better for the classification task?

Data Science for All 2023

Lecture 1: Classification

32 of 87

How to define purity of each region?

  • Classification error
  • Gini Index
  • Entropy

32

Data Science for All 2023

Lecture 1: Classification

33 of 87

Classification Error

  •  

33

Data Science for All 2023

Lecture 1: Classification

34 of 87

Classification Error

  •  

34

Data Science for All 2023

Lecture 1: Classification

35 of 87

Stopping Conditions & Pruning

  • If we don’t terminate the decision tree learning algorithm manually, the tree will continue to grow until each region defined by the model possibly contains exactly one training point (and the model attains 100% training accuracy).
  • To prevent this from happening, we can simply stop the algorithm at a particular depth.
  • But how do we determine the appropriate depth?

35

Data Science for All 2023

Lecture 1: Classification

36 of 87

Variance vs Bias

  • We make some observations about our models:
  • (High Bias) A tree of depth 4 is not a good fit for the training data - it’s unable to capture the nonlinear boundary separating the two classes.
  • (Low Bias) With an extremely high depth, we can obtain a model that correctly classifies all points on the boundary (by zig-zagging around each point).
  • (Low Variance) The tree of depth 4 is robust to slight perturbations in the training data - the square carved out by the model is stable if you move the boundary points a bit.
  • (High Variance) Trees of high depth are sensitive to perturbations in the training data, especially to changes in the boundary points.
  • Not surprisingly, complex trees have low bias (able to capture more complex geometry in the data) but high variance (can overfit). Complex trees are also harder to interpret and more computationally expensive to train.

36

Data Science for All 2023

Lecture 1: Classification

37 of 87

Stopping Conditions

  • Common simple stopping conditions:
  • Don’t split a region if all instances in the region belong to the same class.
  • Don’t split a region if the number of instances in the sub-region will fall below pre-defined threshold (min_samples_leaf).
  • Don’t split a region if the total number of leaves in the tree will exceed pre-defined threshold.
  • The appropriate thresholds can be determined by evaluating the model on a held-out data set or, better yet, via cross-validation.

37

Data Science for All 2023

Lecture 1: Classification

38 of 87

Alternative to Using Stopping Conditions

  • What is the major issue with pre-specifying a stopping condition?
    • you may stop too early or stop too late.
  • How can we fix this issue?
    • choose several stopping criterion and cross-validate which is the best.
  • What is an alternative approach to this issue?
    • Don’t stop. Instead prune back!

38

Data Science for All 2023

Lecture 1: Classification

39 of 87

The Scalable Analytics Institute (ScAI)�Department of Computer Science�University of California, Los Angeles (UCLA)

Instructor: Jeehyun Hwang

Lecture 2: Clustering

Data Science for All 2023

Data Science for All 2023

Lecture 1: Classification

40 of 87

Image Segmentation

40

Data Science for All 2023

Lecture 1: Classification

41 of 87

Unsupervised Learning

41

Data Science for All 2023

Lecture 1: Classification

42 of 87

Unsupervised Learning

42

Data Science for All 2023

Lecture 1: Classification

43 of 87

Unsupervised Learning

43

Data Science for All 2023

Lecture 1: Classification

44 of 87

WHAT IS CLUSTERING ANALYSIS?

44

July 17, 2023

Section Divider

Section Divider

Section Divider

Data Science for All 2023

Lecture 1: Classification

45 of 87

What is Cluster Analysis?

  • Cluster: A collection of data objects
    • similar (or related) to one another within the same group
    • dissimilar (or unrelated) to the objects in other groups
  • Cluster analysis (or clustering, data segmentation, …)
    • Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters
  • Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by examples: supervised)
  • Typical applications
    • As a stand-alone tool to get insight into data distribution
    • As a preprocessing step for other algorithms

45

Data Science for All 2023

Lecture 1: Classification

46 of 87

What is Cluster Analysis?

46

Data Science for All 2023

Lecture 1: Classification

47 of 87

What is Cluster Analysis?

47

Data Science for All 2023

Lecture 1: Classification

48 of 87

What is Cluster Analysis? - KNN

48

Data Science for All 2023

Lecture 1: Classification

49 of 87

What is Cluster Analysis? - KNN

49

Data Science for All 2023

Lecture 1: Classification

50 of 87

Applications of Cluster Analysis

  • Prediction based on groups
    • Cluster & find characteristics/patterns for each group
  • Finding K-nearest Neighbors
    • Localizing search to one or a small number of clusters
  • Outlier detection: Outliers are often viewed as those “far away” from any cluster

50

Data Science for All 2023

Lecture 1: Classification

51 of 87

Clustering: Application Examples - Species

51

Data Science for All 2023

Lecture 1: Classification

52 of 87

Clustering: Application Examples - Marketing

52

Data Science for All 2023

Lecture 1: Classification

53 of 87

Clustering: Application Examples

  • Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species
  • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs
  • Information retrieval: document clustering
  • Land use: Identification of areas of similar land use in an earth observation database
  • City-planning: Identifying groups of houses according to their house type, value, and geographical location
  • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
  • Climate: understanding earth climate, find patterns of atmospheric and ocean

53

Data Science for All 2023

Lecture 1: Classification

54 of 87

K-means objective

  • Partitioning method: Partitioning a dataset D of n objects into a set of k clusters, such that the sum of squared distances is minimized (where cj is the centroid of cluster Cj)

  • Given k, find a partition of k clusters that optimizes the chosen partitioning criterion
    • Global optimal: exhaustively enumerate all partitions
    • Heuristic methods: k-means k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the center of the cluster

54

 

Data Science for All 2023

Lecture 1: Classification

55 of 87

K-means objective

  • Q: suppose there are 5 data points in the data set, and we want to cluster them into 2 clusters. In total, how many ways are there for us to partition the data into two clusters?

55

Data Science for All 2023

Lecture 1: Classification

56 of 87

K-means objective – Iterative Process

  • Step-1: Select the number K to decide the number of clusters.
  • Step-2: Select random K points or centroids. (It can be other from the input dataset).
  • Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters
  • Step-4: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.
  • Step-5: If any reassignment occurs, then go to step-4 else go to FINISH.
  • Step-6: The model is ready.

56

Data Science for All 2023

Lecture 1: Classification

57 of 87

K-means Example

57

Data Science for All 2023

Lecture 1: Classification

58 of 87

K-means Example

58

Data Science for All 2023

Lecture 1: Classification

59 of 87

K-means Example on Image Segmentation

59

Data Science for All 2023

Lecture 1: Classification

60 of 87

K-means Example

60

Data Science for All 2023

Lecture 1: Classification

61 of 87

Hierarchical Clustering

  • Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input, but needs a termination condition - WHY?

61

Step 0

Step 1

Step 2

Step 3

Step 4

b

d

c

e

a

a b

d e

c d e

a b c d e

Step 4

Step 3

Step 2

Step 1

Step 0

agglomerative

(AGNES)

divisive

(DIANA)

Data Science for All 2023

Lecture 1: Classification

62 of 87

Pseudo Code

  • Initialization: Place each data point into its own cluster and compute distance matrix between clusters
  • Repeat:
    • Merge the two closest clusters
    • Update the distance matrix for the affected entries
  • Until: all the data are merged into a single cluster

62

Data Science for All 2023

Lecture 1: Classification

63 of 87

Hierchical Clustering

63

Data Science for All 2023

Lecture 1: Classification

64 of 87

Hierchical Clustering

64

Data Science for All 2023

Lecture 1: Classification

65 of 87

Hierchical Clustering

65

Data Science for All 2023

Lecture 1: Classification

66 of 87

Hierchical Clustering

66

Data Science for All 2023

Lecture 1: Classification

67 of 87

Hierchical Clustering

67

Data Science for All 2023

Lecture 1: Classification

68 of 87

Hierchical Clustering

68

Data Science for All 2023

Lecture 1: Classification

69 of 87

Hierchical Clustering

69

Data Science for All 2023

Lecture 1: Classification

70 of 87

Hierchical Clustering

70

Data Science for All 2023

Lecture 1: Classification

71 of 87

Hierchical Clustering

71

Data Science for All 2023

Lecture 1: Classification

72 of 87

Density-Based Clustering Methods

  • Clustering based on density (local cluster criterion), such as density-connected points
  • Major features:
    • Discover clusters of arbitrary shape
    • Handle noise
    • One scan
    • Need density parameters as termination condition
  • Several interesting studies:
    • DBSCAN: Ester, et al. (KDD96)
    • OPTICS*: Ankerst, et al (SIGMOD99).
    • DENCLUE*: Hinneburg & D. Keim (KDD98)
    • CLIQUE*: Agrawal, et al. (SIGMOD98) (more grid-based)

72

Data Science for All 2023

Lecture 1: Classification

73 of 87

DBSCAN: Basic Concepts

  • Two parameters:
    • Eps: Maximum radius of the neighborhood
    • MinPts: Minimum number of points in an Eps-neighborhood of that point
  • NEps(q): {p belongs to D | dist(p,q) ≤ Eps}
  • q is a core point if:

|NEps (q)| ≥ MinPts

  • Directly density-reachable: A point p is directly density-reachable from a point q w.r.t. Eps, MinPts if
    • q is a core point
    • p belongs to NEps(q)

73

MinPts = 5

Eps = 1 cm

p

q

Data Science for All 2023

Lecture 1: Classification

74 of 87

DBSCAN: Basic Concepts

74

Core

Border

Eps = 1cm

MinPts = 5

Data Science for All 2023

Lecture 1: Classification

75 of 87

DBSCAN: Basic Concepts

75

Core

Border

Noise

Eps = 1cm

MinPts = 5

Data Science for All 2023

Lecture 1: Classification

76 of 87

DBSCAN: Density-Based Spatial Clustering of Applications with Noise

  • Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points
  • Noise: object not contained in any cluster is noise
  • Discovers clusters of arbitrary shape in spatial databases with noise

76

Core

Border

Noise

Eps = 1cm

MinPts = 5

Data Science for All 2023

Lecture 1: Classification

77 of 87

DBSCAN: The Algorithm

77

Data Science for All 2023

Lecture 1: Classification

78 of 87

DBSCAN: can detect clusters with arbitrary shapes

78

Data Science for All 2023

Lecture 1: Classification

79 of 87

DBSCAN: sensitive to minimum num of points

79

Data Science for All 2023

Lecture 1: Classification

80 of 87

DBSCAN: sensitive to eps

80

Data Science for All 2023

Lecture 1: Classification

81 of 87

K-means vs DBSCAN

81

Data Science for All 2023

Lecture 1: Classification

82 of 87

K-means vs DBSCAN

82

K-means

DBSCAN

Data Science for All 2023

Lecture 1: Classification

83 of 87

K-means vs DBSCAN

83

K-means

DBSCAN

Data Science for All 2023

Lecture 1: Classification

84 of 87

K-means vs DBSCAN

84

K-means

DBSCAN

Data Science for All 2023

Lecture 1: Classification

85 of 87

K-means vs DBSCAN

85

K-means

DBSCAN

Data Science for All 2023

Lecture 1: Classification

86 of 87

K-means vs DBSCAN

86

Data Science for All 2023

Lecture 1: Classification

87 of 87

Summary

  • Cluster analysis groups objects based on their similarity and has wide applications;
  • K-means algorithm is a popular partitioning-based clustering algorithm
  • hierarchical clustering algorithms
  • DBSCAN is a density-based algorithm

87

Data Science for All 2023

Lecture 1: Classification