�
Data Mining_Anoop Chaturvedi
1
Swayam Prabha
Course Title
Multivariate Data Mining- Methods and Applications
Lecture 31
Self-Organizing Map
By
Anoop Chaturvedi
Department of Statistics, University of Allahabad
Prayagraj (India)
Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha
Self Organizing Map (SOM) or Kohonen Self Organizing Feature Map or Kohonen Neural Network
An artificial neural network that is trained using unsupervised learning to produce a low-dimensional representation of the input space.
Data Mining_Anoop Chaturvedi
2
Data Mining_Anoop Chaturvedi
3
Data Mining_Anoop Chaturvedi
4
Data Mining_Anoop Chaturvedi
5
Advantages of SOM:
Applications
Data Mining_Anoop Chaturvedi
6
Training Steps:
Initialization ⇒ Weights are randomly initialized.
Training ⇒ Present input samples to the SOM and adjust the weights to better match the input patterns using the following steps:
(i) Neighborhood Selection ⇒ A neighborhood around the best-matching unit (BMU) is selected. The BMU is the node whose weight vector is closest to the input sample in the input space.
(ii) Weight Update ⇒ The weights of the nodes within the selected neighborhood are updated to be more similar to the input sample.
(iii) Iterations ⇒ Training process continues for a fixed number of iterations or until convergence is achieved,
Data Mining_Anoop Chaturvedi
7
Data Mining_Anoop Chaturvedi
8
Data Mining_Anoop Chaturvedi
9
Data Mining_Anoop Chaturvedi
10
Data Mining_Anoop Chaturvedi
11
Data Mining_Anoop Chaturvedi
12
Data Mining_Anoop Chaturvedi
13
Data Mining_Anoop Chaturvedi
14
Data Mining_Anoop Chaturvedi
15
Inverse
Power
Linear
Data Mining_Anoop Chaturvedi
16
Unified Distance Matrix (U Matrix )
Constructed by calculating the distances between neighboring neurons in the SOM grid
These distances are visualized as a grid of values, using a grayscale or color scale. Useful in visualizing the clustering structure and the smoothness of the SOM. Areas of low values (dark regions) indicate regions where neurons are close together in the input space, indicating the presence of clusters or similar patterns.
Areas of high values (light regions) indicate regions where neurons are far apart, representing transitions between different clusters or patterns.
Data Mining_Anoop Chaturvedi
17
Data Mining_Anoop Chaturvedi
18
Data Mining_Anoop Chaturvedi
19
Hierarchical SOM
SOMs are used for dimensionality reduction and visualization of high-dimensional data
HSOMs provide a more structured organization of the data into multiple levels of abstraction.
Tree of maps. Lower maps act as a pre-processing stage.
Nodes in each level of the hierarchy are themselves SOMs.
Enables to capture complex relationships within the data at different levels of granularity.
The data is clustered and organized into smaller and more manageable groups at each level of the hierarchy.
Data Mining_Anoop Chaturvedi
20
Data Mining_Anoop Chaturvedi
21
Example: iris data (aweSOM package of R)
Quality Measures for SOM
Quantization error ⇒ Average squared distance between the data points and the map’s prototypes to which they are mapped. Lower is better.
Percentage of explained variance ⇒ Share of total variance that is explained by the clustering (=1-(quantization error)/(total variance). Higher is better.
Topographic error ⇒ Share of observations for which the best-matching node is not a neighbor of the second-best matching node. Lower is better. 0 indicates excellent topographic representation (all best and second-best matching nodes are neighbors), 1 is the maximum error (best and second-best nodes are never neighbors).
Data Mining_Anoop Chaturvedi
22
Data Mining_Anoop Chaturvedi
23
Kaski-Lagus error = (mean distance between points and their best-matching prototypes)+(mean geodesic distance between the points and their second-best matching prototype)
Geodesic distance ⇒ Pairwise prototype distances following the SOM grid
U-plot ⇒ Darker cells are close to their neighbors
Data Mining_Anoop Chaturvedi
24
Super-classes of SOM ⇒ Cluster the SOM map into super-classes, groups of cells with similar profiles.
Use classic clustering algorithms on the map’s prototypes.
Data Mining_Anoop Chaturvedi
25
Data Mining_Anoop Chaturvedi
26
Data Mining_Anoop Chaturvedi
27