ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
GroupProtein family Cluster 0Cluster 1Cluster 2Cluster 3
Cluster 4
Cluster 5Cluster 6Cluster 7Cluster 8Cluster 9Total
2
Group 1a) Agglomerative clustering of the Lysozyme CGCh dataset (FMI=0.88)
3
4
Lysozyme C520Not applicable (only 2 clusters)52
5
Lysozyme G17017
6
Lysozyme Ch05454
7
Total6954123
8
b) Agglomerative clustering of the Xylanase dataset (FMI=0.98)
9
10
GH103511Not applicable (only 2 clusters)352
11
GH117340347
12
Total358341699
13
c) Spectral clustering of the Chitinase dataset (FMI=0.85)
14
15
GH1843923Not applicable (only 2 clusters)462
16
GH1943151194
17
Total482174656
18
Group 2d) K-means, Gaussian mixture model, agglomerative, and spectral clustering of the Lysozyme CaLA dataset (FMI=1.00)
19
20
α-Lactalbumin220Not applicable (only 2 clusters)22
21
Lysozyme C05252
22
Total225274
23
e) Agglomerative and spectral clustering of the Protease dataset (FMI=0.70)
24
25
Trypsin5611Not applicable (only 2 clusters)67
26
Chymotrypsin16016
27
Total721183
28
f) Spectral clustering of the Ferredoxin dataset (FMI=0.75)
29
30
FDX118Not applicable (only 2 clusters)9
31
FDX2617
32
Total7916
33
Group 3g) Agglomerative clustering of the β-Glucosidase dataset (FMI=0.78)
34
35
GH1287160Not applicable (only 3 clusters)348
36
GH32115330204
37
Total30815490552
38
h) K-means and Gaussian mixture model of the Lysozyme dataset (FMI=0.44)
39
40
GH246002500000031
41
GH22040260301371207135
42
GH230003901004347
43
Total64026643014712410213
44
i) Gaussian mixture model of the β-Galactosidase dataset (FMI=0.86)
45
46
GH274043Not applicable (only 3 clusters)117
47
GH3552837295
48
GH423904483
49
Total11828394495
50
Group 4j) K-means, Gaussian mixture model, and agglomerative clustering of the GH2 dataset (FMI=0.37)
51
52
β-Glucuronidase27295042145885
53
β-Galactosidase0415242020141262117
54
Total27332024242215161110202
55
k) Spectral clustering of the GH3 dataset (FMI=0.62)
56
57
β-Glucosidase16242Not applicable (only 2 clusters)204
58
1,4-β-Xylosidase891099
59
Total25152303
60
l) Spectral clustering of the GH5 dataset (FMI=0.59)
61
62
Cellulase112230Not applicable (only 2 clusters)342
63
Endo-β-Mannanase
6364127
64
Total175294469
65
Group 5m) K-means and Gaussian mixture model of the SNARE dataset (FMI=0.86)
66
67
SNARE156Not applicable (only 2 clusters)57
68
non-SNARE53861
69
Total5464118
70
n) Gaussian mixture model of the GPCR1 (FMI=0.658)
71
72
GPCR16925Not applicable (only 2 clusters)1697
73
non-GPCR (TM)16352721907
74
Total33272773604
75
o) Gaussian mixture model of the GPCR2 dataset (FMI=0.94)
76
77
GPCR165047Not applicable (only 2 clusters)1697
78
non-GPCR (no TM)
5518061861
79
Total170518533558
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100