1 of 74

CS60050: Machine Learning

Sourangshu Bhattacharya

CSE, IIT Kharagpur

2 of 74

Course organization

Classroom: NR - 412

Slots:

Monday (8:00 – 9:55)

Tuesday (12:00 – 12:55)

Website:�https://sourangshu.github.io/IITKGP-ML-Spring-2026/

Moodle (for assignment submission):�https://moodlecse.iitkgp.ac.in/moodle/

3 of 74

Teaching Assistants

Saptarshi Mondal

Suman Kumar Bera

Vaishnovi Arun

Bibhudatta Bhanja

Dhyana Vardhan

4 of 74

Evaluation

Grades:

Midsem, Endsem: 50 - 60

Class Tests: 20 - 30

Term Project: 20 - 30

Class tests will be surprise ! If you miss it, you loose marks.

List of Term Projects will be share with you this week. You need to form groups of size 3 – 4 and select one project.

Both Term Project and assignment will require you to write code.

Assignments will not be graded.

5 of 74

Tentative Schedule (Changeable)

Week

Date-1

Date-2

Topic

1

5/1/26

6/1/26

Introduction classification

2

12/1/26

13/1/26

Linear models regression

3

19/1/26

20/1/26

SVM - kernel

4

26/1/26

27/1/26

Probabilistic ML Naive Bayes, Bayesian regression

5

2/2/26

3/2/26

DT, Bagging, Random forests

6

9/2/26

10/2/26

Boosting, Xgboost

16/2/26

17/2/26

Mid-sem

23/2/26

24/2/26

Mid-sem

7

2/3/26

3/3/26

Clustering GMM

8

9/3/26

10/3/26

Graphical Models

9

16/3/26

17/3/26

Neural Networks

10

23/3/26

24/3/26

Learning Theory and Metrics for problems

11

30/3/26

31/3/26

Id-ul-fitr

12

6/4/26

7/4/26

Active, Transfer, multi-task learning

13

13/4/26

14/4/26

Explainability and Trustworthiness

6 of 74

COURSE BACKGROUND

7 of 74

8 of 74

9 of 74

Turing Test

  • (Human) judge communicates with a human and a machine over text-only channel.
  • Both human and machine try to act like a human.
  • Judge tries to tell which is which.
  • Numerous variants

10 of 74

Turing Test on Unsuspecting Judges

  • It is possible to (temporarily) fool humans who do not realize they may be talking to a bot

  • ELIZA program [Weizenbaum 66] rephrases partner’s statements and questions (~psychotherapist)

11 of 74

Turing Test

12 of 74

Turing Test

13 of 74

What is Artificial Intelligence

“[The automation of] activities that we associate with human thinking, activities such as decision making, problem solving, learning” (Bellman 1978)

“The study of mental faculties through the use of computational models” (Charniak & McDermott, 1985)

14 of 74

Good Old AI Days

15 of 74

Representing Knowledge

  • Logic

  • Rules

  • Semantic Graphs/Nets

16 of 74

A Few Statements

  • All people who are graduating are happy.
  • All happy people smile.
  • Someone is graduating.
  • Is someone smiling? (Conclusion)

17 of 74

Predicates

  1. For all (x) graduating (x) happy (x)
  2. For all (x) happy (x) smiling (x)
  3. There exists (x) graduating (x)

18 of 74

Rule Based Inference Example

(R1) if gas_in_engine and does not start, then problem(spark_plugs).

(R2) if not (does not start) and not (lights_on), then problem(battery).

(R3) if not(turns_over) and light_on, then problem(starter). (R4) if gas_in_tank and gas_in_carb, then gas_in_engine

19 of 74

Semantic Nets

Elephant

Africa

head

Nellie

Animal

Is a

Is a

Lives in

has

20 of 74

21 of 74

Hit the Wall

  • Ambiguity: highly funded translation programs (Russian to English) were good at syntactic manipulation but bad at disambiguation

“The spirit is willing but the flesh is weak” becomes “The vodka is good but the meat is rotten”

  • Scalability/complexity: early examples were very small, programs could not scale to bigger instances

  • Limitations of representations used

22 of 74

AI Winter

23 of 74

Machine Learning

24 of 74

Data

25 of 74

Data

  • Collection of measurements or observations that can be used to train a model.
  • Can be categorical (cats,dogs,lion,etc), ordinal (tall, medium, short), continuous

(10-15,15-20,20-25,...).

26 of 74

Data

27 of 74

Machine Learning

  • a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data (seen data) and generalize to unseen data and thus perform tasks without explicit instructions.
  • “a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” - Tom Mitchell, 1997

Emergence of Supervised Learning

28 of 74

Supervised Learning Human Perspective

  • Scenario [How a child learns]
    • Below are the images of cats and dogs

Cats Dogs

29 of 74

Supervised Learning Human Perspective

  • Scenario [How a child learns]
    • Now, we provide the child with some images and ask the child which is what?

30 of 74

Supervised Learning Human Perspective

  • Scenario [How a child learns]
    • Child guesses the images are cats and dogs as shown below.

Cats Dogs

How did a child know?

31 of 74

Supervised Learning Human Perspective

  • Scenario [How a child learns]
    • The child understood the features which distinguishes dogs from cats!

Cats Dogs

Eyes

Facial Features

32 of 74

Supervised Learning Model Perspective

  • Model Perspective
    • If we are to use a machine learning model which tells us how to decide/classify which is a cat or dog, we train the model on the images as shown below

Cats

Dogs

Train

Machine Learning Model

33 of 74

Supervised Learning Model Perspective

  • Procedure :
    • Divide Dataset into Train and Test

Training Dataset

Testing Dataset

34 of 74

Supervised Learning Model Perspective

  • Procedure
    • Predict the classes from the Model on the images in the similar manner

Machine

Learning Model

Feature Extractor

Cats

Dogs

Feed

Predict

35 of 74

Problems with Data Annotation

  • Supervised Learning requires labelled data.
  • Large amounts of data available (in Zettabytes).
  • Almost all are unlabelled.
  • Cannot label each and every data be it image, text, audio, video, molecules etc. Labour Intensive
  • What can we do?

36 of 74

Unsupervised Learning

When data is unlabelled

37 of 74

Unsupervised Learning

  • Definition :

Type of learning where data is unlabelled/unknown and models learn these type of data for hidden patterns or data groupings.

  • Types :
    • Clustering : Discover groupings from unknown data. Example : Spam Emails
    • Association : Find out rules to express your unknown data. Example : Recommendation System

38 of 74

Unsupervised Learning

  • Scenario [Human Behavior]
    • Suppose your class is being ready for a group photograph and taking positions in order of heights.
    • You do not possess any prior knowledge of the heights of your classmates.

39 of 74

Unsupervised Learning

  • Scenario [Human Behavior]
    • You try to get into a position according to the height.
    • You figure it out without being told as to where to stand.

40 of 74

Unsupervised Learning

Recommender Systems

Example of Unsupervised Learning

Cluster of Detective Novels

Those who bought this

also bought these

41 of 74

Semi Supervised Learning

  • Definition :
    • Supervised Unsupervised
    • Using labelled and unlabelled data for classification and regression tasks.
    • Number of labelled data is usually much less than that of unlabelled data.
    • Primarily dealing with unlabelled data.
  • One such example is pseudo labelling.

42 of 74

Semi Supervised Learning

Books are present, know labels of some and not of others

43 of 74

Semi Supervised Learning

History Chemistry

We know the labels of these 2 books among other books

How can we get to know about the class of other books?

44 of 74

Semi Supervised Learning

History

Chemistry

We use Clustering Algorithm (Unsupervised Learning) to cluster similar books (based on content) and label them!

45 of 74

Semi Supervised Learning

History

Chemistry

We use Clustering Algorithm (Unsupervised Learning) to cluster similar books (based on content) and label them!

46 of 74

Semi Supervised Learning

History Chemistry

Model

Learned Parameters

We then use these labelled data and train our Model in a Supervised Learning Manner!

47 of 74

Types of Learning

Supervised Learning Unsupervised Learning Semi-Supervised Learning.

Ensemble Learning [combining models together] Self Supervised Learning [how we learn language] Reinforcement Learning [how robot learns]

48 of 74

Discriminative And Generative Tasks

  • Discriminative Tasks : Classification
    • Identify whether Cat or Dog
    • Identify whether the next word would be noun or pronoun.
  • Generative Tasks :
    • Generates sentences based on instruction.
    • ChatGpt

49 of 74

Generative Task is a Discriminative Task!

Consider the well known sentence containing all the English Alphabets -

“The quick brown fox jumps over a lazy dog”

Suppose Given

“The quick brown

The model needs to generate the entire sentence

50 of 74

Generative Task is a Discriminative Task!

Task given “The quick brown” classify which of the following words will be the next word.

[fox, ox, tiger, ant, duck] — the model l classify “fox” as 1 and the rest as 0.

Recursively Given “The quick brown fox”, classify which word will be the next word — the model classify “jumps

Given .“The quick brown fox jumps”, classify which word will be the next word — the model will classify “over” and so on.

We observe that if a series of such Prediction (or) Classification task (or) Discriminative Task are done, and each word predicted is appended with the phrase and run again for Prediction, we get a sentence.

51 of 74

Generative Task is a Discriminative Task!

Et Voila! We find that Generative Task is a Sequence of Discriminative Task!

52 of 74

Concept Learning Example

Version Spaces

53 of 74

Concept learning

Example / Instance: an atomic (real life) situation / object over which we want to learn.�

Instance space: Set of all possible instances.�

Attributes: observable quantities which describe a situation.�

Concept: a Boolean valued function over set of examples.�

Hypothesis space: subset of all Boolean valued functions over instance space.

54 of 74

Concept Learning - example

Attributes: Sky, Air temp, Humidity, Wind, Weather, Forecast.�

Instance space X. What is the size ?�

Hypothesis space: conjunction of literals ( which are conditions over attributes).

Conditions are of the form: (attr=val) or (attr=?) or (attr=φ)�

What is the size of hypothesis space ?

55 of 74

Concept Learning - example

56 of 74

Inductive learning problem

Training examples: D={ (x1,c(x1) , … , (xn,(c(xn)) }�

Problem: Given D, learn hϵH, such that for all xϵX , h(x)=c(x).�

Inductive learning assumption:

Any hypothesis found to approximate target concept well over sufficiently large training set, will also approximate it well over unseen examples.

57 of 74

General to specific ordering

Example x is said to be positive if c(x) = 1, else negative.�

Hypothesis h “satisfies” x, if h(x) = 1 .�

Hypothesis h2 is said to be “more general or equal to” h1 if�for all x: h1(x) = 1 implies h2(x) = 1

58 of 74

General to specific ordering

59 of 74

Find - S

Finding maximally specific hypothesis

60 of 74

Find – S Example

61 of 74

Find – S Problems

Can’t tell whether it has learned the concept�

Can’t tell whether the data is inconsistent�

Picks maximally specific hypothesis�

There might be several maximally specific hypothesis.

62 of 74

Version Space

63 of 74

Version space representation

64 of 74

Version space

65 of 74

Candidate Elimination

66 of 74

Candidate Elimination

If d is a negative example:

67 of 74

Example Problem

68 of 74

Example

Workout …

69 of 74

Convergence

Candidate elimination will converge to the target concept if:

Training data doesn’t have errors.

Target concept lies in the hypothesis space.

Otherwise

G and S sets become null.

70 of 74

Partially learned concept

71 of 74

What next training example ?

<Sunny, Warm, Normal, Light, Warm, Same>

72 of 74

Observations

The hypothesis space is biased.

Example: XOR concept cannot be expressed.

Unbiased learner – disjunction of conjunctions.

Learned Version space:

S set: all positive examples

G set: compliment of all negative examples

Can we use the partially learned concept from above ?

There is perfect ambiguity for all examples not in training set.

73 of 74

Unbiased learning

Learning in an unbiased hypothesis space is futile as it cannot generalize to examples other than training examples.

74 of 74

End of Slides