1 of 62

A Parallel and Distributed Semantic Genetic Programming System

Bernardo Galvão & Leonardo Vanneschi

Nova IMS

2 of 62

The basics

Island Model

2

3 of 62

Genetic Programming (GP)

Evolutionary writing of a program
Individuals can be represented as trees
A program can be a mathematical function (this case) or even computer programming instructions

3

Image Source

4 of 62

Genetic Programming (GP)

Standard Crossover

Select a random crossover point in each of the parents
Pass and substitute the resulting subtrees

4

Image Source

5 of 62

Genetic Programming (GP)

Standard Mutation

Select a random mutation point
Generate a random subtree (usually with maximum depth 6)

5

Image Source

6 of 62

Geometric Semantic Genetic Programming (GSGP)

6

Semantics: vector of outputs of an individual (i.e. function), where each element of this vector represents the output of a training instance.

7 of 62

GSGP only uses with different crossover and mutation operators!

7

8 of 62

Geometric Semantic Genetic Programming (GSGP)

GS Crossover

Take both parents entirely
Add nodes “around” them according to:

Equates to a geometric mean of the semantics of the parents

8

where R is a random real function with codomain [0,1]

9 of 62

Geometric Semantic Genetic Programming (GSGP)

GS Mutation

Take entirety of unique parent
Add nodes “around” it according to:

Equates to a ball mutation in the semantic space

9

where R1 and R2 are random real functions with codomain [0,1]

10 of 62

Island Model

It’s a parallel and distributed form of genetic programming in which the population is distributed over subpopulations.
Subpopulations evolve in parallel and synchronize in prefixed migration instants, in which they exchange individuals.
Migration takes place in a ring configuration.
The best individuals in the origin subpopulation replace the worst in the destination subpopulation (best-to-worst)

10

Image Source

11 of 62

Island Model

Traditionally a single flavour of GP is ran for all subpopulations.

11

Image Source

What if each subpopulation ran a different flavour of GP?

12 of 62

Multi-Population

Hybrid

Genetic Programming (MPHGP)

Proposition
The model
Evaluation methodology

12

13 of 62

Background

GSGP

Great optimization power and generalization ability
Strict growth of individuals given that parents are taken for their entirety

Linearly if using GS Mutation
Exponentially if using GS Crossover

Great sizes of these solutions impact negatively readability and interpretability

GP

Ability to find more parsimonious solutions
Sometimes faces the bloat problem
Relatively less optimization power

13

14 of 62

Proposition

By means of a hybrid multi-population system,
bring down the size of GSGP solutions.

14

15 of 62

The model

15

16 of 62

2-Population Hybrid Genetic Programming (MPHGP-2)

16

Multi-Objective GP

(MOGP)

Geometric Semantic GP

(GSGP)

Optimizes for size and training error (NSGA-II)
Uses standard variation operators

Optimizes for training error only
Uses geometric semantic variation operators

MOGP migrants

GSGP migrants

17 of 62

The datasets

17

Dataset	# Features	# Instances
Bioavailability (%F)	241	206
Protein Plasma Binding (PPB)	626	131
Toxicity (LD50)	626	234
Concrete	8	1029
Energy	8	768

18 of 62

18

MPHGP	MOGP	GSGP

sub-GSGP

sub-MOGP

200 indivs

400 individuals

MOGP

400 individuals

GSGP

400 individuals

The results presented are: Median training error, unseen error and size From 30 independent runs From 5 datasets

19 of 62

The results

Cosine operator
2-subpopulation MPHGP (MPHGP-2)
Introspection of MPHGP-2
Increasing number of subpopulations

19

20 of 62

Cosine operator

20

21 of 62

The cosine operator is included among addition, subtraction, multiplication and protected division.
It generally helps in optimizing the fitness function while reducing size of GSGP.

21

22 of 62

Cosine Operator - Bioavailability dataset

22

Legend: *-β: without the cosine operator

Rank-Sum Test	Train	Unseen	Size
MOGP-ß vs MOGP	0.74987	0.22102	0.50378
GSGP-ß vs GSGP	2.84E-05	0.13591	1.92E-06

Next subsection

toBoxplot()

23 of 62

2-Population Hybrid Genetic Programming

(MPHGP-2)

23

24 of 62

MPHGP-2

24

Next subsection

25 of 62

MPHGP-2

25

Next subsection

Migration Rate: Percentage of subpopulation size determining how many individuals migrate.

Migration Frequency: How many generations before migration taking place.

Except where noted, migration frequency is 50 and migration rate is 0.15.

26 of 62

MPHGP-2 - Bioavailability dataset

26

Next subsection

toBoxplot()

Rank-Sum Test	Train	Unseen	Size
MPHGP vs MOGP	1.73E-06	0.00666	1.73E-06
MPHGP vs GSGP	0.64352	0.25364	2.13E-06

27 of 62

MPHGP-2 - Introspection

27

28 of 62

What really happened

28

sub-GSGP

sub-MOGP

here

and here

Migration Rate: Percentage of subpopulation size determining how many individuals migrate.

Migration Frequency: How many generations before migration takes place.

29 of 62

MPHGP-2

Migration rates have almost no impact.

Tested across migration rates {0.05, 0.15, 0.25, 0.35} and migration frequencies {25, 50, 100}.
Across 5 datasets for 30 independent runs. Thus, at least 4*3*5*30 = 1800 runs were performed just for MPHGP-2.

Migration frequency matters. But why?

Both subpopulations need to be on par in terms of fitness.
Migration frequency determines the moment of the first migration at which the subpopulations may be on par.

29

30 of 62

MPHGP-2 Introspection - Bioavailability dataset

30

Next subsection

31 of 62

MPHGP-2 Introspection - Concrete dataset

31

Next subsection

32 of 62

MPHGP-2 Introspection - Energy dataset

32

Next subsection

f = 25

33 of 62

Increasing number of subpopulations

MPHGP- {2, 4, 8, 20, 40}

33

34 of 62

Increasing number of subpopulations

34

Next section

35 of 62

Increasing number of subpopulations - Energy dataset

35

Next section

toBoxplot()

#subpops	2	4	8	20	40
2	-	0.27116	0.00072	3.72E-05	3.11E-05
4	0.00045	-	0.00277	2.84E-05	2.13E-06
8	1.92E-06	1.49E-05	-	0.02183	0.00196
20	1.73E-06	1.73E-06	0.02431	-	0.09777
40	1.73E-06	1.73E-06	3.72E-05	0.00822	-

36 of 62

Right, running times!

36

37 of 62

Closing remarks

Caveats
What was learned
Future work

37

38 of 62

Caveats

Only Geometric Semantic Mutation was performed and tested. This greatly prevents exponential growth of individuals, but was also what allowed for more experiments.
Number of elite survivors was too high, especially in subpopulations of lower size. This was to maintain a clean inference without any other parametric interference.

It is worth a re-run in the cases of number of subpopulations = {8, 20, 40} with only one elite survivor

38

39 of 62

What was learned

While new, less size-growth dependent, geometric semantic operators are yet to be discovered, it is still possible to find comparable and parsimonious solutions.
A hybrid system, especially when running different selection methods, need special attention to put the subpopulations on par, in terms of training error.

39

40 of 62

Future Work

Implement more parsimoniousness-oriented GP algorithms into the hybrid system
Different migration directions
Implement mathematical simplification algorithms.

Allow me to slide this here: nodevo, an implementation in Rust

40

41 of 62

Thank you! Questions?

bgalvao

burnie093

41

42 of 62

Appendix

42

Aiding slides (mainly boxplots) are here

43 of 62

Cosine Operator - Bioavailability dataset

43

Legend: *-β: without the cosine operator

Rank-Sum Test	Train	Unseen	Size
MOGP-ß vs MOGP	0.74987	0.22102	0.50378
GSGP-ß vs GSGP	2.84E-05	0.13591	1.92E-06

Next subsection

toLines()

44 of 62

Cosine Operator - PPB dataset

44

Legend: *-β: without the cosine operator

Rank-Sum Test	Train	Unseen	Size
MOGP-ß vs MOGP	0.00057	0.44052	0.13779
GSGP-ß vs GSGP	0.06871	0.17138	1.73E-06

Next subsection

toLines()

45 of 62

Cosine Operator - Toxicity dataset

45

*-β: without the cosine operator

Rank-Sum Test	Train	Unseen	Size
MOGP-ß vs MOGP	0.97539	0.81302	0.11080
GSGP-ß vs GSGP	0.00148	0.29894	1.73E-06

Next subsection

toLines()

46 of 62

Cosine Operator - Concrete dataset

46

Legend: *-β: without the cosine operator

Rank-Sum Test	Train	Unseen	Size
MOGP-ß vs MOGP	0.01852	0.02564	0.06408
GSGP-ß vs GSGP	0.01752	0.65833	1.73E-06

Next subsection

toLines()

47 of 62

Cosine Operator - Energy dataset

47

Legend: *-β: without the cosine operator

Rank-Sum Test	Train	Unseen	Size
MOGP-ß vs MOGP	0.95899	0.71889	0.03222
GSGP-ß vs GSGP	1.73E-06	3.11E-05	1.73E-06

Next subsection

toLines()

48 of 62

MPHGP-2 - Bioavailability dataset

48

Next subsection

toLines()

Rank-Sum Test	Train	Unseen	Size
MPHGP vs MOGP	1.73E-06	0.00666	1.73E-06
MPHGP vs GSGP	0.64352	0.25364	2.13E-06

49 of 62

MPHGP-2 - PPB dataset

49

Next subsection

toLines()

Rank-Sum Test	Train	Unseen	Size
MPHGP vs MOGP	1.73E-06	0.00385	1.73E-06
MPHGP vs GSGP	0.00171	0.18462	2.97E-05

50 of 62

MPHGP-2 - Toxicity dataset

50

Next subsection

toLines()

Rank-Sum Test	Train	Unseen	Size
MPHGP vs MOGP	0.53044	0.31849	1.53E-05
MPHGP vs GSGP	1.73E-06	0.03501	1.73E-06

51 of 62

MPHGP-2 - Concrete dataset

51

Next subsection

toLines()

Rank-Sum Test	Train	Unseen	Size
MPHGP vs MOGP	1.73E-06	1.73E-06	1.73E-06
MPHGP vs GSGP	0.38203	0.87740	0.00080

52 of 62

MPHGP-2 - Energy dataset

52

Next subsection

toLines()

Rank-Sum Test	Train	Unseen	Size
MPHGP vs MOGP	1.73E-06	2.13E-06	1.73E-06
MPHGP vs GSGP	0.22102	0.64352	8.47E-06

Note: f = 25

53 of 62

Increasing number of subpopulations - Bioavailability dataset

53

Next section

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

toLines()

#subpops	2	4	8	20	40
2	-	0.15286	0.49080	0.46528	0.86121
4	0.00014	-	0.18462	0.61431	0.37094
8	0.00003	0.05984	-	0.29894	0.97539
20	0.00007	0.10201	0.46528	-	0.61431
40	0.00057	0.18462	0.55774	0.39333	-

54 of 62

Increasing number of subpopulations - PPB dataset

54

Next section

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

toLines()

#subpops	2	4	8	20	40
2	-	0.26230	0.03327	0.00277	0.02703
4	1.80E-05	-	0.73433	0.11093	0.22102
8	1.92E-06	1.49E-05	-	0.22888	0.36004
20	1.73E-06	5.75E-06	0.15286	-	0.81140
40	1.73E-06	1.73E-06	1.24E-05	0.00773	-

55 of 62

Increasing number of subpopulations - Toxicity dataset

55

Next section

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

toLines()

#subpops	2	4	8	20	40
2	-	0.36004	0.84508	0.73433	0.32857
4	5.31E-05	-	0.27116	0.64352	0.44052
8	4.07E-05	0.82901	-	0.28021	0.74987
20	6.89E-05	0.30861	0.50383	-	0.90993
40	0.00015	0.47795	0.55774	0.57165	-

56 of 62

Increasing number of subpopulations - Concrete dataset

56

Next section

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

toLines()

#subpops	2	4	8	20	40
2	-	0.00049	8.47E-06	3.18E-06	1.36E-05
4	2.84E-05	-	0.00049	8.47E-06	1.02E-05
8	5.75E-06	1.13E-05	-	0.01108	0.00258
20	1.73E-06	2.13E-06	0.00096	-	0.12044
40	1.73E-06	1.73E-06	0.00011	0.03160	-

57 of 62

Increasing number of subpopulations - Energy dataset

57

Next section

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

toLines()

#subpops	2	4	8	20	40
2	-	0.27116	0.00072	3.72E-05	3.11E-05
4	0.00045	-	0.00277	2.84E-05	2.13E-06
8	1.92E-06	1.49E-05	-	0.02183	0.00196
20	1.73E-06	1.73E-06	0.02431	-	0.09777
40	1.73E-06	1.73E-06	3.72E-05	0.00822	-

58 of 62

Increasing number of subpopulations - Bioavailability dataset

58

59 of 62

Increasing number of subpopulations - PPB dataset

59

60 of 62

Increasing number of subpopulations - Toxicity dataset

60

61 of 62

Increasing number of subpopulations - Concrete dataset

61

62 of 62

Increasing number of subpopulations - Energy dataset