A Parallel and Distributed Semantic Genetic Programming System
Bernardo Galvão & Leonardo Vanneschi
The basics
2
Genetic Programming (GP)
3
Genetic Programming (GP)
Standard Crossover
4
Genetic Programming (GP)
Standard Mutation
5
Geometric Semantic Genetic Programming (GSGP)
6
Semantics: vector of outputs of an individual (i.e. function), where each element of this vector represents the output of a training instance.
GSGP only uses with different crossover and mutation operators!
7
Geometric Semantic Genetic Programming (GSGP)
GS Crossover
8
where R is a random real function with codomain [0,1]
Geometric Semantic Genetic Programming (GSGP)
GS Mutation
9
where R1 and R2 are random real functions with codomain [0,1]
Island Model
10
Island Model
11
What if each subpopulation ran a different flavour of GP?
Multi-Population
Hybrid
Genetic Programming (MPHGP)
12
13
Proposition
14
The model
15
2-Population Hybrid Genetic Programming (MPHGP-2)
16
Multi-Objective GP
(MOGP)
Geometric Semantic GP
(GSGP)
MOGP migrants
GSGP migrants
17
Dataset | # Features | # Instances |
Bioavailability (%F) | 241 | 206 |
Protein Plasma Binding (PPB) | 626 | 131 |
Toxicity (LD50) | 626 | 234 |
Concrete | 8 | 1029 |
Energy | 8 | 768 |
18
MPHGP | MOGP | GSGP |
sub-GSGP
sub-MOGP
200 indivs
200 indivs
400 individuals
MOGP
400 individuals
GSGP
400 individuals
The results presented are:
|
The results
19
Cosine operator
20
21
Cosine Operator - Bioavailability dataset
22
Legend: *-β: without the cosine operator
Rank-Sum Test | Train | Unseen | Size |
MOGP-ß vs MOGP | 0.74987 | 0.22102 | 0.50378 |
GSGP-ß vs GSGP | 2.84E-05 | 0.13591 | 1.92E-06 |
Next subsection
toBoxplot()
2-Population Hybrid Genetic Programming
(MPHGP-2)
23
MPHGP-2
24
Next subsection
MPHGP-2
25
Next subsection
Except where noted, migration frequency is 50 and migration rate is 0.15.
MPHGP-2 - Bioavailability dataset
26
Next subsection
toBoxplot()
Rank-Sum Test | Train | Unseen | Size |
MPHGP vs MOGP | 1.73E-06 | 0.00666 | 1.73E-06 |
MPHGP vs GSGP | 0.64352 | 0.25364 | 2.13E-06 |
MPHGP-2 - Introspection
27
What really happened
28
sub-GSGP
sub-MOGP
here
and here
MPHGP-2
29
MPHGP-2 Introspection - Bioavailability dataset
30
Next subsection
MPHGP-2 Introspection - Concrete dataset
31
Next subsection
MPHGP-2 Introspection - Energy dataset
32
Next subsection
f = 25
Increasing number of subpopulations
MPHGP- {2, 4, 8, 20, 40}
33
Increasing number of subpopulations
34
Next section
Increasing number of subpopulations - Energy dataset
35
Next section
toBoxplot()
#subpops | 2 | 4 | 8 | 20 | 40 |
2 | - | 0.27116 | 0.00072 | 3.72E-05 | 3.11E-05 |
4 | 0.00045 | - | 0.00277 | 2.84E-05 | 2.13E-06 |
8 | 1.92E-06 | 1.49E-05 | - | 0.02183 | 0.00196 |
20 | 1.73E-06 | 1.73E-06 | 0.02431 | - | 0.09777 |
40 | 1.73E-06 | 1.73E-06 | 3.72E-05 | 0.00822 | - |
Right, running times!
36
Closing remarks
37
Caveats
38
What was learned
39
Future Work
Allow me to slide this here: nodevo, an implementation in Rust
40
Thank you! Questions?
bgalvao
burnie093
41
Appendix
42
Aiding slides (mainly boxplots) are here
Cosine Operator - Bioavailability dataset
43
Legend: *-β: without the cosine operator
Rank-Sum Test | Train | Unseen | Size |
MOGP-ß vs MOGP | 0.74987 | 0.22102 | 0.50378 |
GSGP-ß vs GSGP | 2.84E-05 | 0.13591 | 1.92E-06 |
Next subsection
toLines()
Cosine Operator - PPB dataset
44
Legend: *-β: without the cosine operator
Rank-Sum Test | Train | Unseen | Size |
MOGP-ß vs MOGP | 0.00057 | 0.44052 | 0.13779 |
GSGP-ß vs GSGP | 0.06871 | 0.17138 | 1.73E-06 |
Next subsection
toLines()
Cosine Operator - Toxicity dataset
45
*-β: without the cosine operator
Rank-Sum Test | Train | Unseen | Size |
MOGP-ß vs MOGP | 0.97539 | 0.81302 | 0.11080 |
GSGP-ß vs GSGP | 0.00148 | 0.29894 | 1.73E-06 |
Next subsection
toLines()
Cosine Operator - Concrete dataset
46
Legend: *-β: without the cosine operator
Rank-Sum Test | Train | Unseen | Size |
MOGP-ß vs MOGP | 0.01852 | 0.02564 | 0.06408 |
GSGP-ß vs GSGP | 0.01752 | 0.65833 | 1.73E-06 |
Next subsection
toLines()
Cosine Operator - Energy dataset
47
Legend: *-β: without the cosine operator
Rank-Sum Test | Train | Unseen | Size |
MOGP-ß vs MOGP | 0.95899 | 0.71889 | 0.03222 |
GSGP-ß vs GSGP | 1.73E-06 | 3.11E-05 | 1.73E-06 |
Next subsection
toLines()
MPHGP-2 - Bioavailability dataset
48
Next subsection
toLines()
Rank-Sum Test | Train | Unseen | Size |
MPHGP vs MOGP | 1.73E-06 | 0.00666 | 1.73E-06 |
MPHGP vs GSGP | 0.64352 | 0.25364 | 2.13E-06 |
MPHGP-2 - PPB dataset
49
Next subsection
toLines()
Rank-Sum Test | Train | Unseen | Size |
MPHGP vs MOGP | 1.73E-06 | 0.00385 | 1.73E-06 |
MPHGP vs GSGP | 0.00171 | 0.18462 | 2.97E-05 |
MPHGP-2 - Toxicity dataset
50
Next subsection
toLines()
Rank-Sum Test | Train | Unseen | Size |
MPHGP vs MOGP | 0.53044 | 0.31849 | 1.53E-05 |
MPHGP vs GSGP | 1.73E-06 | 0.03501 | 1.73E-06 |
MPHGP-2 - Concrete dataset
51
Next subsection
toLines()
Rank-Sum Test | Train | Unseen | Size |
MPHGP vs MOGP | 1.73E-06 | 1.73E-06 | 1.73E-06 |
MPHGP vs GSGP | 0.38203 | 0.87740 | 0.00080 |
MPHGP-2 - Energy dataset
52
Next subsection
toLines()
Rank-Sum Test | Train | Unseen | Size |
MPHGP vs MOGP | 1.73E-06 | 2.13E-06 | 1.73E-06 |
MPHGP vs GSGP | 0.22102 | 0.64352 | 8.47E-06 |
Note: f = 25
Increasing number of subpopulations - Bioavailability dataset
53
Next section
(ノ◕ヮ◕)ノ*:・゚✧
toLines()
#subpops | 2 | 4 | 8 | 20 | 40 |
2 | - | 0.15286 | 0.49080 | 0.46528 | 0.86121 |
4 | 0.00014 | - | 0.18462 | 0.61431 | 0.37094 |
8 | 0.00003 | 0.05984 | - | 0.29894 | 0.97539 |
20 | 0.00007 | 0.10201 | 0.46528 | - | 0.61431 |
40 | 0.00057 | 0.18462 | 0.55774 | 0.39333 | - |
Increasing number of subpopulations - PPB dataset
54
Next section
(ノ◕ヮ◕)ノ*:・゚✧
toLines()
#subpops | 2 | 4 | 8 | 20 | 40 |
2 | - | 0.26230 | 0.03327 | 0.00277 | 0.02703 |
4 | 1.80E-05 | - | 0.73433 | 0.11093 | 0.22102 |
8 | 1.92E-06 | 1.49E-05 | - | 0.22888 | 0.36004 |
20 | 1.73E-06 | 5.75E-06 | 0.15286 | - | 0.81140 |
40 | 1.73E-06 | 1.73E-06 | 1.24E-05 | 0.00773 | - |
Increasing number of subpopulations - Toxicity dataset
55
Next section
(ノ◕ヮ◕)ノ*:・゚✧
toLines()
#subpops | 2 | 4 | 8 | 20 | 40 |
2 | - | 0.36004 | 0.84508 | 0.73433 | 0.32857 |
4 | 5.31E-05 | - | 0.27116 | 0.64352 | 0.44052 |
8 | 4.07E-05 | 0.82901 | - | 0.28021 | 0.74987 |
20 | 6.89E-05 | 0.30861 | 0.50383 | - | 0.90993 |
40 | 0.00015 | 0.47795 | 0.55774 | 0.57165 | - |
Increasing number of subpopulations - Concrete dataset
56
Next section
(ノ◕ヮ◕)ノ*:・゚✧
toLines()
#subpops | 2 | 4 | 8 | 20 | 40 |
2 | - | 0.00049 | 8.47E-06 | 3.18E-06 | 1.36E-05 |
4 | 2.84E-05 | - | 0.00049 | 8.47E-06 | 1.02E-05 |
8 | 5.75E-06 | 1.13E-05 | - | 0.01108 | 0.00258 |
20 | 1.73E-06 | 2.13E-06 | 0.00096 | - | 0.12044 |
40 | 1.73E-06 | 1.73E-06 | 0.00011 | 0.03160 | - |
Increasing number of subpopulations - Energy dataset
57
Next section
(ノ◕ヮ◕)ノ*:・゚✧
toLines()
#subpops | 2 | 4 | 8 | 20 | 40 |
2 | - | 0.27116 | 0.00072 | 3.72E-05 | 3.11E-05 |
4 | 0.00045 | - | 0.00277 | 2.84E-05 | 2.13E-06 |
8 | 1.92E-06 | 1.49E-05 | - | 0.02183 | 0.00196 |
20 | 1.73E-06 | 1.73E-06 | 0.02431 | - | 0.09777 |
40 | 1.73E-06 | 1.73E-06 | 3.72E-05 | 0.00822 | - |
Increasing number of subpopulations - Bioavailability dataset
58
Increasing number of subpopulations - PPB dataset
59
Increasing number of subpopulations - Toxicity dataset
60
Increasing number of subpopulations - Concrete dataset
61
Increasing number of subpopulations - Energy dataset