1 of 136

Machine Learning

Prof. Seungtaek Choi

2 of 136

Do You Know Nano Banana? (1)

Google’s image generation AI (namely Gemini 2.5 Flash Image)

Prompt: Show me a man who is 33 years old, actually a professor at Hankuk University of Foreign Studies, and give a lecture about Linear Regression.

3 of 136

Do You Know Nano Banana? (2)

https://www.instagram.com/p/DOCZxPtCbdJ/

Nano Banana + Kling

4 of 136

Do You Know Nano Banana? (3)

https://x.com/GoogleAIStudio/status/1964024315638403231

5 of 136

Last Time

Course Overview
Introduction to AI

6 of 136

Today

Announcement: 1^st Assignment!
Git/GitHub Basics
Introduction to ML

Definition
Supervised Learning vs. Unsupervised Learning
Classification, Regression, Dimension Reduction, Clustering, Model Selection

Linear Regression
Gradient Descent Algorithm

7 of 136

1^st Assignment!

Assignment #1: Write your introduction and submit PR

Deadline: 11:59 PM at Sep 17^th (1 week)
Practice your Git and GitHub
Feel free to react each other’s introduction (that’s also review!)
Repository: https://github.com/HUFS-LAI-Seungtaek/HUFS-LAI-ML-2025-2
Follow the instructions in https://github.com/HUFS-LAI-Seungtaek/HUFS-LAI-ML-2025-2/tree/main/assignments/assignment1

8 of 136

Git/GitHub Basics�(Or, How to Submit Assignment)

9 of 136

What is Git?

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

10 of 136

What is GitHub?

GitHub is a cloud-based platform built on the "Git" version control system that provides tools for developers to store, manage, share, and collaborate on code and other files.

11 of 136

Git & GitHub?

git: version control & code management (local)
github: code storage (cloud)
push: upload code to github

push

12 of 136

Components of Git: Repository

Repository is a version-controlled project space that stores your files, branches, and full change history.

13 of 136

Components of Git: Branch

Branch is an independent line of development – a named timeline of commits within a repo.

14 of 136

Components of Git: Commit

Commit is a saved snapshot of changes with a message, author, and timestamp.

15 of 136

Git Branching Structure (Dev)

src: https://docs.symbol.dev/handbook/git-branching.html

branch

commit

16 of 136

Git Branching Structure (Ours)

main

HUFS-LAI-Seungtaek/HUFS-LAI-OOP-2025-2:main

hufs-student-2/HUFS-LAI-OOP-2025-2:main

main+1

assignment1.py

main+2

assignment1.py

main+2

main+1

main

hufs-student-1/HUFS-LAI-OOP-2025-2:main

main+1

assignment1.py

17 of 136

Git Workflow

18 of 136

Git Workflow

Repository structure:

upstream (professor:main): original repo
origin (student:main): your repo
local: your computer

19 of 136

Git Workflow: fork

20 of 136

Git Workflow: clone

21 of 136

Git Workflow: add & commit

22 of 136

Git Workflow: add & commit

23 of 136

Git Workflow: add & commit

24 of 136

Git Workflow: add & commit

25 of 136

Git Workflow: add & commit

26 of 136

Git Workflow: push

27 of 136

Git Workflow: PR & merge

28 of 136

Git Workflow: branch

For your assignment, no need to use `branch`

professor:main (remote) 🡪 fork
student:main (remote) 🡪 clone
student:main (local) 🡪 commit
student:main+1 (local) 🡪 commit
student:main+2 (local) 🡪 push
student:main+2 (remote) 🡪 PR
professor:main+1 (remote)

It’s not about remote vs. local.
It’s about w/ permission vs. w/o permission.

29 of 136

Git Workflow: branch

In your team’s repo, if you are not allowed to push `main` branch, …

student:main (remote) 🡪 clone
student:main (local) 🡪 checkout (In this case, $ git checkout –b feature)
student:feature (local) 🡪 commit
student:feature+1 (local) 🡪 commit
student:feature+2 (local) 🡪 push
student:feature+2 (remote) 🡪 PR
student:main+1 (remote)

30 of 136

GitHub Web Shortcuts �(Or, How to Submit Assignment #1)

31 of 136

Fork repository

32 of 136

Fork repository

33 of 136

Repo is copied under your account.

34 of 136

Add a file

35 of 136

Add a file

36 of 136

Add a file

members/{학생이름}.md

Example: members/seungtaek.md

Example: members/yeachan.md

Example: members/gildong.md

Don’t include {}�Don’t use uppercase

Don’t use Korean

37 of 136

38 of 136

Introduce yourself

You can see actual “code” from

https://github.com/HUFS-LAI-Seungtaek/HUFS-LAI-OOP-2025-2/blob/main/members/seungtaek.md?plain=1

Feel free to introduce yourself more! (but, PLEASE FOLLOW THE OVERAL FORMAT!)

39 of 136

Commit the change (your file)

40 of 136

Back to “your” repo

41 of 136

Submit PR to lecture repository

42 of 136

Submit PR to lecture repository

43 of 136

Submit PR to lecture repository

Please follow the format �n-th Assignment by {학번} ({Full name})

�It’s important to check your submission status.

Do not use ‘ or “

Please use ` (look at the ~)

You can see the preview.

44 of 136

Introduction to ML

45 of 136

What is Machine Learning?

What is machine?
What is learning?

H. Simon: Any process by which a system improves its performance
H. Minsky: Learning is making useful changes in our minds
R. Michalsky: Learning is constructing or modifying representations of what is being experienced
L. Valiant: Learning is the process of knowledge acquisition in the absence of explicit programming

46 of 136

What is Machine Learning?

Machine Learning

“[the] field of study that gives computers the ability to learn without being explicitly programmed.” – Arthur L. Samuel (1959)
“A learning machine, broadly defined, is any device whose actions are influenced by past experiences.” – Nils J. Nilsson (1965)
“Pattern recognition – the act of taking in raw data and taking an action based on the ‘category’ of the pattern.” – Duda & Hart (1973)
“The study and computer modeling of learning processes in their multiple manifestations constitutes the subject matter of machine learning” – Carbonell, Michalski & Mitchell (1983)
“A computer program is said to learn from experience E … if its performance at tasks in T, as measured by P, improves with experience E.” – Tom M. Mitchell (1997)
“The goal of machine learning is to program computers to use example data or past experience to solve a given problem.” – Ethem Alpaydin (2004)
“[We] define machine learning as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data.” – Kevin P. Murphy (2012)
“ML models use big data to learn and improve predictability and performance automatically … without being programmed … by humans.” – OECD (2020s)

47 of 136

What is Machine Learning?

Machine Learning

“[the] field of study that gives computers the ability to learn without being explicitly programmed.” – Arthur L. Samuel (1959)
“A learning machine, broadly defined, is any device whose actions are influenced by past experiences.” – Nils J. Nilsson (1965)
“Pattern recognition – the act of taking in raw data and taking an action based on the ‘category’ of the pattern.” – Duda & Hart (1973)
“The study and computer modeling of learning processes in their multiple manifestations constitutes the subject matter of machine learning” – Carbonell, Michalski & Mitchell (1983)
“A computer program is said to learn from experience E … if its performance at tasks in T, as measured by P, improves with experience E.” – Tom M. Mitchell (1997)
“The goal of machine learning is to program computers to use example data or past experience to solve a given problem.” – Ethem Alpaydin (2004)
“[We] define machine learning as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data.” – Kevin P. Murphy (2012)
“ML models use big data to learn and improve predictability and performance automatically … without being programmed … by humans.” – OECD (2020s)

48 of 136

What is Machine Learning?

https://velog.io/@ddizzang/LG-Aimers-5%EA%B8%B0-module2-ML%EA%B0%9C%EB%A1%A0

49 of 136

Why study machine learning?

Easier to build a learning system than to hand-code a working program!

Robot that learns a map of environment by exploring
Programs that learn to play games by playing against themselves

Improving on existing programs

Instruction scheduling and register allocation in compilers
Combinatorial optimization problems

Discover knowledge and patterns in highly dimensional, complex data

Sky surveys
Sequence analysis in bioinformatics
Social network analysis
Ecosystem analysis

50 of 136

Very brief history

Studied ever since computers were invented (e.g., Samuel’s checkers player)
Very active in 1960s (neural networks)
Died down in the 1970s
Revival in early 1980s (decision trees, backpropagation, temporal-difference learning) - coined as “machine learning”
Exploded starting in the 1990s
Now: very active research field, several yearly conferences (e.g., ICML, NeurIPS), major journals (e.g., Machine Learning, Journal of Machine Learning Research)
The time is right to study in the field!

Lots of recent progress in algorithms and theory
Flood of data to be analyzed
Computational power is available
Growing demand for industrial applications

51 of 136

Very brief history

Ref: https://arxiv.org/abs/2109.01517

52 of 136

What are good machine learning tasks?

There is no human expert

E.g., DNA analysis, video recommendation

Humans can perform the task but cannot explain how

E.g., character recognition

Desired function changes frequently

E.g., predicting stock prices based on recent trading data

Each user needs a customized function

E.g., news filtering

53 of 136

Important application areas…

Bioinformatics: sequence alignment, analyzing microarray data, information integration, …
Computer vision: object recognition, tracking, segmentation, active vision, …
Robotics: state estimation, map building, decision making
Graphics: building realistic simulations
Speech: recognition, speaker identification
Financial analysis: option pricing, portfolio allocation
E-commerce: automated trading agents, data mining, spam, …
Medicine: diagnosis, treatment, drug design, …
Computer games: building adaptive opponents
Multimedia: retrieval across diverse databases

54 of 136

Supervised Learning

vs.

Unsupervised Learning

55 of 136

Supervised Learning

56 of 136

Supervised Learning Example 0: Linear function

57 of 136

Supervised Learning Example 0: Linear function

58 of 136

Supervised Learning Example 1: Housing price prediction

You can practice on Kaggle:

House Prices - Advanced Regression Techniques | Kaggle

59 of 136

Supervised Learning Example 2: Breast cancer (malignant, benign)

60 of 136

Supervised Learning Example 2: Breast cancer (malignant, benign)

61 of 136

Supervised Learning Example 3: Spam Detection

62 of 136

Unsupervised Learning

Training experience: unlabeled data
What to learn: interesting associations in the data
E.g., image segmentation, clustering
Often there is no single correct answer

63 of 136

Unsupervised Learning Example 1: Gene Clustering

Training experience: unlabeled data
What to learn: interesting associations in the data
E.g., image segmentation, clustering
Often there is no single correct answer

64 of 136

Unsupervised Learning Example 2: Customer Segmentation

src: https://medium.com/analytics-vidhya/customer-segmentation-for-differentiated-targeting-in-marketing-using-clustering-analysis-3ed0b883c18b

https://www.kaggle.com/code/emineyetm/customer-segmentation-with-unsupervised-learning

65 of 136

Supervised Learning vs. Unsupervised Learning

66 of 136

Reinforcement Learning

Problems involving an agent interacting with an environment which provides numeric reward signals
Goal: Learn how to take actions in order to maximize reward

src: https://www.youtube.com/watch?v=qv6UVOQ0F44

67 of 136

Machine Learning – Taxonomy of Problems

Classification
Regression
Density Estimation
Dimension Reduction
Clustering
Model Selection

68 of 136

Classification

69 of 136

Regression

70 of 136

Density Estimation

71 of 136

Dimension Reduction

In dimension reduction, one attempts to learn a low-dimensional manifold to represent complex data.
e.g. PCA (Principal Component Analysis), ICA (Independent Component Analysis)

72 of 136

Clustering

Clustering refers to techniques to segmenting data into coherent “clusters”
e.g. k-means, Mixtures-of-Gaussians, mean-shift

73 of 136

Model Selection

“Given a choice of two models, which one is more appropriate to the data?”
“How big should the model be?”
e.g. A model with more parameters will fit the data better, but it could also overfit the data.

74 of 136

Thus far…

Supervised vs. unsupervised learning

Different types of machine learning problem

Classification
Regression
Density Estimation
Dimension Reduction
Clustering
Model Selection

75 of 136

Supervised Learning /

Regression

76 of 136

Supervised Learning Problem

77 of 136

Housing price prediction (again)

78 of 136

Housing price prediction (again)

79 of 136

Housing price prediction (again)

80 of 136

Housing price prediction (again)

81 of 136

82 of 136

83 of 136

84 of 136

85 of 136

86 of 136

87 of 136

88 of 136

89 of 136

90 of 136

91 of 136

92 of 136

93 of 136

94 of 136

95 of 136

Gradient Descent Algorithm

96 of 136

97 of 136

98 of 136

99 of 136

100 of 136

101 of 136

102 of 136

103 of 136

Gradient Descent Algorithm �for Linear Regression

104 of 136

105 of 136

106 of 136

107 of 136

108 of 136

109 of 136

110 of 136

111 of 136

112 of 136

113 of 136

114 of 136

115 of 136

Linear Regression �with Multiple Features

116 of 136

117 of 136

118 of 136

119 of 136

120 of 136

Linear Regression with Multiple Variables�- Gradient Descent in Practice

121 of 136

122 of 136

123 of 136

124 of 136

Linear Regression with Multiple Variables�- Normal Equation

125 of 136

126 of 136

127 of 136

128 of 136

129 of 136

130 of 136

Linear Regression with Multiple Variables�- Features and Polynomial Regression