Clustering for
Image Analysis
Kanika Chopra,
Nicholas Vadivelu
Introduction
KANIKA CHOPRA
(she/her)
4A Math. Finance & Stats
kanikadatt@gmail.com
NICHOLAS VADIVELU (he/him)
4A Computer Science & Stats �nicholas.vadivelu@gmail.com
How can you get involved?
This slide deck: bit.ly/uwdsc_wistem_w21
Facebook Page: facebook.com/uwdsc�Email: waterloodatascience@gmail.com�Discord: bit.ly/uwdsc-discord
Contents
What is Data Science?
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data.
Examples of Data Science
What is M a c h i n e L e a r n i n g ?
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
Examples of Machine Learning
Artificial Intelligence
Artificial Intelligence
Data Science
Machine Learning
What are career options in this field?
Disclaimer: These are our categorizations! Often these titles are mixed, and definitely vary by organization.
Python “Crash Course”
What is Python?
Your First Python Program
print('Hello, World!')
Hello, World!
Program:
Output:
Your First Python Program
print('Hello')�print('World')
Hello�World
Program:
Output:
Variables
x = 'Hello, World!' # assignment�print(x)
y = x # assignment�print(y)
Hello, World!�Hello, World!
Program:
Output:
Variables 2
x = 2�print(x)
x = x + 2�print(x)
x *= 3 # same as x = x * 3�print(x)
2�4�12
Program:
Output:
Conditionals
string = 'great job!'�if string == 'great job!':� print(':)')�else:� print(':(')
:)
Program:
Output:
Conditionals 2
x = 4.0�if x < 3:� print(x, ' is less than 3.')�elif x >= 7:� print(x, ' is >= to 7.')�else:� print(x, ' is between 3 and 6.')
4.0 is between 3 and 6.
Program:
Output:
While Loops
# compute floor(log_2(x))�x = 13�value = -1
while x > 0:� x = x // 2 # integer division� value += 1� print('x =', x)� print('value =', value)� print()
print('floor(log_2(x)) = ', value)
x = 6�value = 0
x = 3�value = 1
x = 1�value = 2
x = 0�value = 3
floor(log_2(x)) = 3
Program:
Output:
Lists
var = [1, 2, 3, 4]�print(var[0]) # indexing�print(var[2])
var.append(5)�print(var)
var.append('nice')�print(var)
1�3�[1, 2, 3, 4, 5]�[1, 2, 3, 4, 5, 'nice']
Program:
Output:
Lists: Advanced Indexing
var = [10, 11, 12, 13, 14, 15]�print(var[-1])�print(var[-2])�print(var[0:4]) # slice�print(var[0:4:2])
15�14�[10, 11, 12, 13]�[10, 12]
Program:
Output:
For Loops
var = [10, 11, 12, 13, 14, 15]
for elem in var:� print(elem + 10)
20�21�22�23�24�25
Program:
Output:
For Loops
print(list(range(10)))�print(list(range(4, 10)))�print(list(range(4, 10, 3)))
for i in range(3):� print(i)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]�[4, 5, 6, 7, 8, 9]�[4, 7]�0�1�2
Program:
Output:
Functions
def func(x, y):� print('Called func')� return x + y
z = func(1, 2)�print(z)�
# could do func(x=1, y=2)�# or func(1, y=2)�# or func(y=2, x=1)
Called func�3
Program:
Output:
NumPy
import numpy as np # load library
x = np.array([1, 2, 3])�print(x)�print(x + 1)
y = np.array([4, 5, 6])�print(x + y)
[1 2 3]�[2 3 4]�[5 7 9]
NumPy
import numpy as np # load library
x = np.array([[1, 2, 3], � [4, 5, 6]])
print(x)�print()
print(np.sum(x))�print()
print(np.sum(x, axis=0))�print()
print(np.sum(x, axis=1))�print()
[[1 2 3]� [4 5 6]]��21��[5 7 9]��[ 6 15]���
NumPy Attributes & Methods
import numpy as np # load library
x = np.array([[1, 2, 3], � [4, 5, 6]])
print(x.shape) # shape is an attribute�
y = x.reshape(2, 3) # reshape is a method
print(y)
print(y.shape)�
z = y.reshape(6)
print(z)
print(z.shape)
(2, 3)
[[1 2 3]� [4 5 6]]
(2, 3)
[1 2 3 4 5 6]
(6,)���
Exercises
Any questions so far?
What is Clustering?
Clustering: Example
Applications of Clustering
Applications of Clustering: Marketing and Sales
Applications of Clustering: Document Classification
Applications of Clustering: Image Analysis
Applications of Clustering: Image Analysis
Detecting cancer in scans
Object detection
Image segmentation
Pic Credit: omicsonline.org
K-Means Clustering: Introduction
Definitions:
Goal: Segment the data into non-overlapping K clusters by minimizing the distance between data points in each cluster and maximizing the distance between the points in distinct clusters
K-Means Clustering: Calculations
K-Means Clustering: Process
Input: Data and the # of clusters
NOTE: We do not know the group labels
Process:
K-Means Clustering: Demo
K = 3, Iteration = 0
K = 3, Iteration = 1
K-Means Clustering: Demo
K = 3, Iteration = 2
K = 3, Iteration = 3
K-Means Clustering: Demo
K = 3, Iteration = 4
K = 3, Iteration = 5
K-Means Clustering: Demo
This link shows a visualization of the different iterations: https://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/
K-Means Clustering: Image Data
K-Means Clustering: FAQ
Data that should belong in the same group may be divided into sub-groups and create less meaningful groups
The differences between clusters may not be represented properly with the clusters; data with different features may end up in the same cluster
Common methods include the “Elbow” method, Silhouette method, and the Sum of Squares method
K-Means Clustering: The “Elbow Method”
K = 3 optimal!
K-Means Clustering: Pros and Cons
Pros
Cons
Resources
Recap
Any questions?