1 of 22

1

Swayam Prabha

Course Title

Multivariate Data Mining- Methods and Applications

Lecture 40

Plaid Models for Block Clustering

By

Anoop Chaturvedi

Department of Statistics, University of Allahabad

Prayagraj (India)

Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha

2 of 22

Plaid Model: Generalization of Biclustering

  • Plaid Models are a form of two-sided cluster analysis (biclustering) that allows clusters to overlap.
  • Incorporate additive two way ANOVA models within the two-sided clusters.
  • Each layer is formed by a subset of rows and columns and can be viewed as a two-way clustering of elements of the data matrix.
  • An application is the search for interpretable biological structure in gene expression microarray data.

Data Mining_Anoop Chaturvedi

2

3 of 22

  • Genes can be members of different layers or of none of them. Overlapping clusters (layers) are allowed.
  • The sums of layers are then fitted to gene-expression data.
  • It also allows a cluster of genes to be defined for only a subset of samples, not necessarily for all of them.
  • For instance, certain yeast genes may cluster together based on their expression patterns during spore formation, while clustering differently under other conditions. The Plaid Model can help identify and characterize these condition-specific gene clusters.

Econometrics_Anoop Chaturvedi

3

4 of 22

  •  

Econometrics_Anoop Chaturvedi

4

5 of 22

  •  

Econometrics_Anoop Chaturvedi

5

6 of 22

  •  

Econometrics_Anoop Chaturvedi

6

7 of 22

  •  

Econometrics_Anoop Chaturvedi

7

8 of 22

  •  

Econometrics_Anoop Chaturvedi

8

9 of 22

  •  

Econometrics_Anoop Chaturvedi

9

10 of 22

  •  

Econometrics_Anoop Chaturvedi

10

 

11 of 22

  •  

Econometrics_Anoop Chaturvedi

11

 

12 of 22

  •  

Econometrics_Anoop Chaturvedi

12

13 of 22

  •  

Econometrics_Anoop Chaturvedi

13

14 of 22

  •  

Econometrics_Anoop Chaturvedi

14

15 of 22

Example: Plaid Model Biclustering for randomly generated data

Data generated from normal distribution

Used plaid model for biclustering

Algorithm used is BCPlaid

Data Mining_Anoop Chaturvedi

16 of 22

Data Mining_Anoop Chaturvedi

17 of 22

Data Mining_Anoop Chaturvedi

18 of 22

Data Mining_Anoop Chaturvedi

19 of 22

Example: Plaid Model Bicluster algorithm BCPlaid

Bicat Yeast gene expression dataset ⇒ Obtained from experiments measuring gene expression levels in yeast cells under various conditions or treatments.

Microarray data matrix for 80 experiments.

Expression levels of 419 probe-sets over 70 conditions.

Performs Plaid Model Biclustering. Algorithm models data matrices to a sum of layers.

Model is fitted to data through minimization of error.

Data Mining_Anoop Chaturvedi

20 of 22

Data Mining_Anoop Chaturvedi

21 of 22

Data Mining_Anoop Chaturvedi

22 of 22

Data Mining_Anoop Chaturvedi