Partitioning
Clustering
Technique
INTRODUCTION
Two Algorithms are
K Mean
K- Mean Algorithm
Input:
D: A dataset containing N number of objects
K: The number of clusters in which the dataset has to be divided
Output:
A dataset of K clusters
Method:
Suppose we want to group the visitors to a website using just their age as follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16
Centroid(C2) = 22
[16, 16, 17]
C1 = 16.33
[20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
C2 = 37.25
Iteration-1:
Iteration-2:
[16, 16, 17, 20, 20, 21, 21, 22, 23]
C1 = 19.55
[29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
C2 = 46.90
Iteration-3:
[16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C1 = 20.50
[36, 41, 42, 43, 44, 45, 61, 62, 66]
C2 = 48.89
Iteration-4:
[16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C1 = 20.50
[36, 41, 42, 43, 44, 45, 61, 62, 66]
C2 = 48.89
K-Medoids (also called Partitioning Around Medoid)
Algorithm
Step 1: Let the randomly selected 2 medoids, so select k = 2, and let C1 -(4, 5) and C2 -(8, 5) are the two medoids.
Step 2: Calculating cost. The dissimilarity of each non-medoid point with the medoids is calculated and tabulated. Where C1 -(4, 5) and C2 -(8, 5)
Distance = |X1-X2| + |Y1-Y2|
The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20
Step 3:
points 1, 2, and 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Comparing k-Means with k-Medoids :
Applicability of PAM: