1 of 35

CSCI-SHU 376: Natural Language Processing

Hua Shen

2026-01-29

Spring 2026

Lecture 3: Text Classification

2 of 35

Today’s Plan

  • Text Classification (Chapter 4)
  • Naïve Bayes (Chapter 4)
  • Logistic Regression (Chapter 5)

3 of 35

Today’s Plan

  • Text Classification (Chapter 4)
  • Naïve Bayes (Chapter 4)
  • Logistic Regression (Chapter 5)

4 of 35

Why Text Classification

5 of 35

Text Classification

 

6 of 35

Rule-based Text Classification

 

  • Rule based system can be very accurate
  • Hard to write rules
  • Expensive
  • Not easily generalizable

7 of 35

Supervised Learning

 

8 of 35

Types of Supervised Learning

9 of 35

Today’s Plan

  • Text Classification (Chapter 4)
  • Naïve Bayes (Chapter 4)
  • Logistic Regression (Chapter 5)

10 of 35

Naïve Bayes Classifier

  • Simple Classification model making use of Bayes rule

11 of 35

Naïve Bayes Classifier

  • Simple Classification model making use of Bayes rule

12 of 35

How to represent P( d | c )

 

13 of 35

Bag of Words

14 of 35

Predicting with Naïve Bayes

15 of 35

Estimate probabilities

16 of 35

Smoothing

  • What if count(“fantastic” , positive) = 0?
  • Laplace Smoothing

17 of 35

Naïve Bayes: Overall Process

 

18 of 35

A worked Example

19 of 35

A worked Example

20 of 35

A worked Example

21 of 35

Naïve Bayes: Pros and Cons

  • Fast, low storage requirements
  • Work well for small amount of data
  • The independence assumption is too strong
  • Does not work well for highly imbalanced classes

22 of 35

Today’s Plan

  • Text Classification (Chapter 4)
  • Naïve Bayes (Chapter 4)
  • Logistic Regression (Chapter 5)

23 of 35

Logistic Regression

  • Powerful supervised model
  • Baseline approach for many NLP tasks
  • Binary or Multinomial

24 of 35

Generative vs Discriminative models

  • Naïve Bayes is a generative model
  • Logistic Regression is a discriminative model

25 of 35

Generative Classifier

  • Build a model of what is a cat image
    • Knows about ears, eyes, etc
    • Assign a probability of any image – how cat-y is this image?

  • Also build a model for dog images

  • For a new image, run both models and see which one fits better

26 of 35

Discriminative Classifier

27 of 35

Overall Process

 

28 of 35

Feature Representation

29 of 35

Feature Representation

30 of 35

Classification function

 

31 of 35

Classification function

32 of 35

Loss function

  • For binary classification

Bernoulli distribution

33 of 35

Optimization

 

34 of 35

Gradients for binary logistic regression

35 of 35

Multinomial Logistic Regression