Multivariate Analysis Practical STA701
Performing Classification Analysis��
Copyright © 2023 by Retha Luus. All rights reserved.
Introduction
First two topics of block week 2:
Both concerned with separation of observations into groups
Discriminant analysis
Classification analysis
Introduction
Introduction
Resubstitution method
Validation set method
LOOCV method
Agenda
Two-Group Classification Analysis
Performing Two-Group Classification Analysis
Hands-on Practice
Several-Group Classification: Equal Covariances
Several-Group Classification: Unequal Covariances
Performing Several-Group Classification Analysis
Hands-on Practice
Summary
Two-Group Classification Analysis
Two-Group Classification Analysis
Two-Group Classification Analysis
Data for Demo
Performing Two-Group Classification Analysis
library(heplots)�boxM(cbind(y1,y2,y3,y4) ~ Group, data=dat5.1)
## �## Box's M-test for Homogeneity of Covariance Matrices�## �## data: Y�## Chi-Sq (approx.) = 13.551, df = 10, p-value = 0.1945
Performing Two-group Classification Analysis
library(MASS)�dat5.1.fit <- lda(Group~., data=dat5.1)
dat5.1.pred <- predict(dat5.1.fit)
Performing Two-group Classification Analysis
dat5.1.conf <- table(dat5.1$Group, dat5.1.pred$class)�dat5.1.conf�## 1 2�## 1 28 4�## 2 4 28
dat5.1.apperr <- (dat5.1.conf[1,2]+dat5.1.conf[2,1])/sum(dat5.1.conf)�dat5.1.apperr
## [1] 0.125
Performing Two-group Classification Analysis
library(caret)�set.seed(1)�inds <- createDataPartition(dat5.1$Group, times=1, p=0.7)�dat5.1.train <- dat5.1[inds[[1]],]�dat5.1.test <- dat5.1[-inds[[1]],]
Performing Two-group Classification Analysis
library(MASS)�dat5.1.fit <- lda(Group~., data= dat5.1.train)
dat5.1.pred <- predict(dat5.1.fit, newdata = dat5.1.test)
Performing Two-group Classification Analysis
dat5.1.conf <- table(dat5.1.test$Group, dat5.1.pred$class)�dat5.1.conf�## 1 2�## 1 9 0�## 2 3 6
dat5.1.vserr <- (dat5.1.conf[1,2]+dat5.1.conf[2,1])/sum(dat5.1.conf)�dat5.1.vserr
## [1] 0.1666667
Performing Two-group Classification Analysis
library(MASS)�dat5.1.fit <- lda(Group~., data=dat5.1, CV=TRUE)
dat5.1.conf <- table(dat5.1$Group, dat5.1.fit$class)�dat5.1.conf�## 1 2�## 1 28 4�## 2 5 27
dat5.1.loocverr <- (dat5.1.conf[1,2]+dat5.1.conf[2,1])/sum(dat5.1.conf)�dat5.1.loocverr
## [1] 0.140625
Hands-on Practice
Several-Group Classification: Equal Covariances
Several-Group Classification: Equal Covariances
Several-Group Classification: Equal Covariances
Several-Group Classification: Unequal Covariances
Several-Group Classification: Unequal Covariances
Several-Group Classification: Unequal Covariances
Several-group classification analysis
Yes
No
Assess
Resubstitution
Validation Set
LOOCV
Data for Demo
## Data��dat8.3 <- read.table("Your path//T8_3_FOOTBALL.DAT", header=FALSE)�colnames(dat8.3) <- c("Group","WDIM","CIRCUM","FBEYE","EYEHD","EARHD","JAW")�dat8.3$Group <- factor(dat8.3$Group)
Performing Several-Group Classification Analysis
library(heplots)�boxM(cbind(WDIM,CIRCUM,FBEYE,EYEHD,EARHD,JAW) ~ Group, data=dat8.3)
## �## Box's M-test for Homogeneity of Covariance Matrices�## �## data: Y�## Chi-Sq (approx.) = 57.472, df = 42, p-value = 0.05622
Performing Several-Group Classification Analysis
library(MASS)�dat8.3.lda <- lda(Group~., data=dat8.3)
dat8.3.pred <- predict(dat8.3.lda)
Performing Several-Group Classification Analysis
dat8.3.lda.conf <- table(dat8.3$Group, dat8.3.pred$class)�dat8.3.lda.conf�## 1 2 3�## 1 26 1 3�## 2 1 20 9�## 3 2 8 20
dat8.3.lda.apperr <- (sum(dat8.3.lda.conf)-dat8.3.lda.conf[1,1]-dat8.3.lda.conf[2,2]-dat8.3.lda.conf[3,3])/sum(dat8.3.lda.conf)�dat8.3.lda.apperr
## [1] 0.2666667
Performing Several-Group Classification Analysis
library(caret)�set.seed(1)�inds <- createDataPartition(dat8.3$Group, times=1, p=0.7)�dat8.3.train <- dat8.3[inds[[1]],]�dat8.3.test <- dat8.3[-inds[[1]],]
library(MASS)�dat8.3.lda <- lda(Group~., data=dat8.3.train)
dat8.3.lda.pred <- predict(dat8.3.lda, newdata = dat8.3.test)
Performing Several-Group Classification Analysis
dat8.3.lda.conf <- table(dat8.3.test$Group, dat8.3.lda.pred$class)�dat8.3.lda.conf�## 1 2 3�## 1 7 1 1�## 2 0 6 3�## 3 1 2 6
dat8.3.lda.vserr <- (sum(dat8.3.lda.conf)-dat8.3.lda.conf[1,1]-dat8.3.lda.conf[2,2]-dat8.3.lda.conf[3,3])/sum(dat8.3.lda.conf)�dat8.3.lda.vserr
## [1] 0.2962963
Performing Several-Group Classification Analysis
library(MASS)�dat8.3.lda <- lda(Group~., data=dat8.3, CV=TRUE)
dat8.3.lda.conf <- table(dat8.3$Group, dat8.3.lda$class)�dat8.3.lda.conf�## 1 2 3�## 1 26 1 3�## 2 1 18 11�## 3 2 9 19
dat8.3.lda.loocverr <- (sum(dat8.3.lda.conf)-dat8.3.lda.conf[1,1]-dat8.3.lda.conf[2,2]-dat8.3.lda.conf[3,3])/sum(dat8.3.lda.conf)�dat8.3.lda.loocverr
## [1] 0.3
Hands-on Practice
For Block Week 2 Practical Question 3:
Submit summary table of LDA and QDA error rates. Which would you choose?
QDA?
library(heplots)�boxM(cbind(WDIM,CIRCUM,FBEYE,EYEHD,EARHD,JAW) ~ Group, data=dat8.3)
## �## Box's M-test for Homogeneity of Covariance Matrices�## �## data: Y�## Chi-Sq (approx.) = 57.472, df = 42, p-value = 0.05622
QDA?
library(MASS)�dat8.3.qda <- qda(Group~., data=dat8.3)
dat8.3.pred <- predict(dat8.3.qda)
QDA?
dat8.3.qda.conf <- table(dat8.3$Group, dat8.3.pred$class)�dat8.3.qda.conf�## 1 2 3�## 1 27 1 2�## 2 2 21 7�## 3 1 4 25
dat8.3.qda.apperr <- (sum(dat8.3.qda.conf)-dat8.3.qda.conf[1,1]-dat8.3.qda.conf[2,2]-dat8.3.qda.conf[3,3])/sum(dat8.3.qda.conf)�dat8.3.qda.apperr
## [1] 0.1888889
QDA?
library(MASS)�dat8.3.qda <- qda(Group~., data=dat8.3.train)
dat8.3.qda.pred <- predict(dat8.3.qda, newdata = dat8.3.test)
QDA?
dat8.3.qda.conf <- table(dat8.3.test$Group, dat8.3.qda.pred$class)�dat8.3.qda.conf�## 1 2 3�## 1 6 2 1�## 2 0 5 4�## 3 2 1 6
dat8.3.qda.vserr <- (sum(dat8.3.qda.conf)-dat8.3.qda.conf[1,1]-dat8.3.qda.conf[2,2]-dat8.3.qda.conf[3,3])/sum(dat8.3.qda.conf)�dat8.3.qda.vserr
## [1] 0.3703704
QDA?
library(MASS)�dat8.3.qda <- qda(Group~., data=dat8.3, CV=TRUE)
dat8.3.qda.conf <- table(dat8.3$Group, dat8.3.qda$class)�dat8.3.qda.conf�## 1 2 3�## 1 26 2 2�## 2 3 16 11�## 3 4 9 17
dat8.3.qda.loocverr <- (sum(dat8.3.qda.conf)-dat8.3.qda.conf[1,1]-dat8.3.qda.conf[2,2]-dat8.3.qda.conf[3,3])/sum(dat8.3.qda.conf)�dat8.3.qda.loocverr
## [1] 0.3444444
Classification Analysis Summary
| Misclassification Error Estimation Method | ||
Classification Method | Resubstitution | Validation Set | LOOCV |
Two-group | 0.125 | 0.167 | 0.141 |
Several-group LDA | 0.267 | 0.296 | 0.300 |
Several-group QDA | 0.189 | 0.370 | 0.344 |
Summary
Decide whether to use linear or quadratic classification functions
Are equal?
Perform two-group and several-group classification analysis by:
writing your own classification functions (see notes)
using the lda and qda functions in MASS
Summary
Calculate apparent error rate of classification functions
Calculate improved estimates of error rate