1 of 62

A Parallel and Distributed Semantic Genetic Programming System

Bernardo Galvão & Leonardo Vanneschi

2 of 62

The basics

  • Island Model

2

3 of 62

Genetic Programming (GP)

  • Evolutionary writing of a program
  • Individuals can be represented as trees
  • A program can be a mathematical function (this case) or even computer programming instructions

3

4 of 62

Genetic Programming (GP)

Standard Crossover

  • Select a random crossover point in each of the parents
  • Pass and substitute the resulting subtrees

4

5 of 62

Genetic Programming (GP)

Standard Mutation

  • Select a random mutation point
  • Generate a random subtree (usually with maximum depth 6)

5

6 of 62

Geometric Semantic Genetic Programming (GSGP)

6

Semantics: vector of outputs of an individual (i.e. function), where each element of this vector represents the output of a training instance.

7 of 62

GSGP only uses with different crossover and mutation operators!

7

8 of 62

Geometric Semantic Genetic Programming (GSGP)

GS Crossover

  • Take both parents entirely
  • Add nodes “around” them according to:
  • Equates to a geometric mean of the semantics of the parents

8

where R is a random real function with codomain [0,1]

9 of 62

Geometric Semantic Genetic Programming (GSGP)

GS Mutation

  • Take entirety of unique parent
  • Add nodes “around” it according to:
  • Equates to a ball mutation in the semantic space

9

where R1 and R2 are random real functions with codomain [0,1]

10 of 62

Island Model

  • It’s a parallel and distributed form of genetic programming in which the population is distributed over subpopulations.
  • Subpopulations evolve in parallel and synchronize in prefixed migration instants, in which they exchange individuals.
  • Migration takes place in a ring configuration.
  • The best individuals in the origin subpopulation replace the worst in the destination subpopulation (best-to-worst)

10

11 of 62

Island Model

  • Traditionally a single flavour of GP is ran for all subpopulations.

11

What if each subpopulation ran a different flavour of GP?

12 of 62

Multi-Population

Hybrid

Genetic Programming (MPHGP)

  • Proposition
  • The model
  • Evaluation methodology

12

13 of 62

  • Background
    • GSGP
      • Great optimization power and generalization ability
      • Strict growth of individuals given that parents are taken for their entirety
        • Linearly if using GS Mutation
        • Exponentially if using GS Crossover
      • Great sizes of these solutions impact negatively readability and interpretability
    • GP
      • Ability to find more parsimonious solutions
      • Sometimes faces the bloat problem
      • Relatively less optimization power

13

14 of 62

Proposition

  • By means of a hybrid multi-population system,
  • bring down the size of GSGP solutions.

14

15 of 62

The model

15

16 of 62

2-Population Hybrid Genetic Programming (MPHGP-2)

16

Multi-Objective GP

(MOGP)

Geometric Semantic GP

(GSGP)

  • Optimizes for size and training error (NSGA-II)
  • Uses standard variation operators
  • Optimizes for training error only
  • Uses geometric semantic variation operators

MOGP migrants

GSGP migrants

17 of 62

  • The datasets

17

Dataset

# Features

# Instances

Bioavailability (%F)

241

206

Protein Plasma Binding (PPB)

626

131

Toxicity (LD50)

626

234

Concrete

8

1029

Energy

8

768

18 of 62

18

MPHGP

MOGP

GSGP

sub-GSGP

sub-MOGP

200 indivs

200 indivs

400 individuals

MOGP

400 individuals

GSGP

400 individuals

The results presented are:

  • Median training error, unseen error and size
  • From 30 independent runs
  • From 5 datasets

19 of 62

The results

  • Cosine operator
  • 2-subpopulation MPHGP (MPHGP-2)
  • Introspection of MPHGP-2
  • Increasing number of subpopulations

19

20 of 62

Cosine operator

20

21 of 62

  • The cosine operator is included among addition, subtraction, multiplication and protected division.
  • It generally helps in optimizing the fitness function while reducing size of GSGP.

21

22 of 62

Cosine Operator - Bioavailability dataset

22

Legend: *-β: without the cosine operator

Rank-Sum Test

Train

Unseen

Size

MOGP-ß vs MOGP

0.74987

0.22102

0.50378

GSGP-ß vs GSGP

2.84E-05

0.13591

1.92E-06

Next subsection

toBoxplot()

23 of 62

2-Population Hybrid Genetic Programming

(MPHGP-2)

23

24 of 62

MPHGP-2

24

Next subsection

25 of 62

MPHGP-2

25

Next subsection

  • Migration Rate: Percentage of subpopulation size determining how many individuals migrate.

  • Migration Frequency: How many generations before migration taking place.

Except where noted, migration frequency is 50 and migration rate is 0.15.

26 of 62

MPHGP-2 - Bioavailability dataset

26

Next subsection

toBoxplot()

Rank-Sum Test

Train

Unseen

Size

MPHGP vs MOGP

1.73E-06

0.00666

1.73E-06

MPHGP vs GSGP

0.64352

0.25364

2.13E-06

27 of 62

MPHGP-2 - Introspection

27

28 of 62

What really happened

28

sub-GSGP

sub-MOGP

here

and here

  • Migration Rate: Percentage of subpopulation size determining how many individuals migrate.

  • Migration Frequency: How many generations before migration takes place.

29 of 62

MPHGP-2

  • Migration rates have almost no impact.
    • Tested across migration rates {0.05, 0.15, 0.25, 0.35} and migration frequencies {25, 50, 100}.
    • Across 5 datasets for 30 independent runs. Thus, at least 4*3*5*30 = 1800 runs were performed just for MPHGP-2.
  • Migration frequency matters. But why?

  • Both subpopulations need to be on par in terms of fitness.
  • Migration frequency determines the moment of the first migration at which the subpopulations may be on par.

29

30 of 62

MPHGP-2 Introspection - Bioavailability dataset

30

Next subsection

31 of 62

MPHGP-2 Introspection - Concrete dataset

31

Next subsection

32 of 62

MPHGP-2 Introspection - Energy dataset

32

Next subsection

f = 25

33 of 62

Increasing number of subpopulations

MPHGP- {2, 4, 8, 20, 40}

33

34 of 62

Increasing number of subpopulations

34

Next section

35 of 62

Increasing number of subpopulations - Energy dataset

35

Next section

toBoxplot()

#subpops

2

4

8

20

40

2

-

0.27116

0.00072

3.72E-05

3.11E-05

4

0.00045

-

0.00277

2.84E-05

2.13E-06

8

1.92E-06

1.49E-05

-

0.02183

0.00196

20

1.73E-06

1.73E-06

0.02431

-

0.09777

40

1.73E-06

1.73E-06

3.72E-05

0.00822

-

36 of 62

Right, running times!

36

37 of 62

Closing remarks

  • Caveats
  • What was learned
  • Future work

37

38 of 62

Caveats

  • Only Geometric Semantic Mutation was performed and tested. This greatly prevents exponential growth of individuals, but was also what allowed for more experiments.
  • Number of elite survivors was too high, especially in subpopulations of lower size. This was to maintain a clean inference without any other parametric interference.
    • It is worth a re-run in the cases of number of subpopulations = {8, 20, 40} with only one elite survivor

38

39 of 62

What was learned

  • While new, less size-growth dependent, geometric semantic operators are yet to be discovered, it is still possible to find comparable and parsimonious solutions.
  • A hybrid system, especially when running different selection methods, need special attention to put the subpopulations on par, in terms of training error.

39

40 of 62

Future Work

  • Implement more parsimoniousness-oriented GP algorithms into the hybrid system
  • Different migration directions
  • Implement mathematical simplification algorithms.

Allow me to slide this here: nodevo, an implementation in Rust

40

41 of 62

Thank you! Questions?

bgalvao

burnie093

41

42 of 62

Appendix

42

Aiding slides (mainly boxplots) are here

43 of 62

Cosine Operator - Bioavailability dataset

43

Legend: *-β: without the cosine operator

Rank-Sum Test

Train

Unseen

Size

MOGP-ß vs MOGP

0.74987

0.22102

0.50378

GSGP-ß vs GSGP

2.84E-05

0.13591

1.92E-06

Next subsection

toLines()

44 of 62

Cosine Operator - PPB dataset

44

Legend: *-β: without the cosine operator

Rank-Sum Test

Train

Unseen

Size

MOGP-ß vs MOGP

0.00057

0.44052

0.13779

GSGP-ß vs GSGP

0.06871

0.17138

1.73E-06

Next subsection

toLines()

45 of 62

Cosine Operator - Toxicity dataset

45

*-β: without the cosine operator

Rank-Sum Test

Train

Unseen

Size

MOGP-ß vs MOGP

0.97539

0.81302

0.11080

GSGP-ß vs GSGP

0.00148

0.29894

1.73E-06

Next subsection

toLines()

46 of 62

Cosine Operator - Concrete dataset

46

Legend: *-β: without the cosine operator

Rank-Sum Test

Train

Unseen

Size

MOGP-ß vs MOGP

0.01852

0.02564

0.06408

GSGP-ß vs GSGP

0.01752

0.65833

1.73E-06

Next subsection

toLines()

47 of 62

Cosine Operator - Energy dataset

47

Legend: *-β: without the cosine operator

Rank-Sum Test

Train

Unseen

Size

MOGP-ß vs MOGP

0.95899

0.71889

0.03222

GSGP-ß vs GSGP

1.73E-06

3.11E-05

1.73E-06

Next subsection

toLines()

48 of 62

MPHGP-2 - Bioavailability dataset

48

Next subsection

toLines()

Rank-Sum Test

Train

Unseen

Size

MPHGP vs MOGP

1.73E-06

0.00666

1.73E-06

MPHGP vs GSGP

0.64352

0.25364

2.13E-06

49 of 62

MPHGP-2 - PPB dataset

49

Next subsection

toLines()

Rank-Sum Test

Train

Unseen

Size

MPHGP vs MOGP

1.73E-06

0.00385

1.73E-06

MPHGP vs GSGP

0.00171

0.18462

2.97E-05

50 of 62

MPHGP-2 - Toxicity dataset

50

Next subsection

toLines()

Rank-Sum Test

Train

Unseen

Size

MPHGP vs MOGP

0.53044

0.31849

1.53E-05

MPHGP vs GSGP

1.73E-06

0.03501

1.73E-06

51 of 62

MPHGP-2 - Concrete dataset

51

Next subsection

toLines()

Rank-Sum Test

Train

Unseen

Size

MPHGP vs MOGP

1.73E-06

1.73E-06

1.73E-06

MPHGP vs GSGP

0.38203

0.87740

0.00080

52 of 62

MPHGP-2 - Energy dataset

52

Next subsection

toLines()

Rank-Sum Test

Train

Unseen

Size

MPHGP vs MOGP

1.73E-06

2.13E-06

1.73E-06

MPHGP vs GSGP

0.22102

0.64352

8.47E-06

Note: f = 25

53 of 62

Increasing number of subpopulations - Bioavailability dataset

53

Next section

(ノ◕ヮ◕)ノ*:・゚✧

toLines()

#subpops

2

4

8

20

40

2

-

0.15286

0.49080

0.46528

0.86121

4

0.00014

-

0.18462

0.61431

0.37094

8

0.00003

0.05984

-

0.29894

0.97539

20

0.00007

0.10201

0.46528

-

0.61431

40

0.00057

0.18462

0.55774

0.39333

-

54 of 62

Increasing number of subpopulations - PPB dataset

54

Next section

(ノ◕ヮ◕)ノ*:・゚✧

toLines()

#subpops

2

4

8

20

40

2

-

0.26230

0.03327

0.00277

0.02703

4

1.80E-05

-

0.73433

0.11093

0.22102

8

1.92E-06

1.49E-05

-

0.22888

0.36004

20

1.73E-06

5.75E-06

0.15286

-

0.81140

40

1.73E-06

1.73E-06

1.24E-05

0.00773

-

55 of 62

Increasing number of subpopulations - Toxicity dataset

55

Next section

(ノ◕ヮ◕)ノ*:・゚✧

toLines()

#subpops

2

4

8

20

40

2

-

0.36004

0.84508

0.73433

0.32857

4

5.31E-05

-

0.27116

0.64352

0.44052

8

4.07E-05

0.82901

-

0.28021

0.74987

20

6.89E-05

0.30861

0.50383

-

0.90993

40

0.00015

0.47795

0.55774

0.57165

-

56 of 62

Increasing number of subpopulations - Concrete dataset

56

Next section

(ノ◕ヮ◕)ノ*:・゚✧

toLines()

#subpops

2

4

8

20

40

2

-

0.00049

8.47E-06

3.18E-06

1.36E-05

4

2.84E-05

-

0.00049

8.47E-06

1.02E-05

8

5.75E-06

1.13E-05

-

0.01108

0.00258

20

1.73E-06

2.13E-06

0.00096

-

0.12044

40

1.73E-06

1.73E-06

0.00011

0.03160

-

57 of 62

Increasing number of subpopulations - Energy dataset

57

Next section

(ノ◕ヮ◕)ノ*:・゚✧

toLines()

#subpops

2

4

8

20

40

2

-

0.27116

0.00072

3.72E-05

3.11E-05

4

0.00045

-

0.00277

2.84E-05

2.13E-06

8

1.92E-06

1.49E-05

-

0.02183

0.00196

20

1.73E-06

1.73E-06

0.02431

-

0.09777

40

1.73E-06

1.73E-06

3.72E-05

0.00822

-

58 of 62

Increasing number of subpopulations - Bioavailability dataset

58

59 of 62

Increasing number of subpopulations - PPB dataset

59

60 of 62

Increasing number of subpopulations - Toxicity dataset

60

61 of 62

Increasing number of subpopulations - Concrete dataset

61

62 of 62

Increasing number of subpopulations - Energy dataset