1 of 13

Graph Entropy

Sergio Hernández Cerezo

2 of 13

Entropy issue #1: Cross-entropy

Gibbs cross-entropy for two distributions P, Q is not well defined: whenever qi is zero and pi is not, H is not defined, as log(0) makes no sense.

We present generalised forms of cross-entropy without this limitation.

H3(Q,P) = 1 + loge(∏(2-qipi))

HGibbs(Q,P) = -∑(pi×log(qi))

H1(Q,P) = ∑(1-qipi)

H2(Q,P) = ∏(2-qipi)

Product-based forms of cross-entropy

3 of 13

Formulations for H3 cross-entropy

H3(P) entropy and cross-entropy can be defined using several interesting formulations.�

H3(Q,P) = ∑(loge(e×(2-qipi)))

H3(Q, P) = 1 + loge(H2(Q, P))

H3(Q,P) = loge(e×∏(2-qipi))

H3(Q, P) = loge(e×H2(Q, P))

H3(Q,P) = 1 + ∑(loge(2-qipi))

4 of 13

Entropy issue #2: Distributions only

Gibbs entropy can only be used on probability distributions:

  • One-level flat structure.
  • ∑ pi = 1 ({pi} represent a partition).

Instead, we will be defining entropy as:

Distribution

p1

p2

p3

𝚷

H3(P) = 1+loge(∏(2-pipi))

5 of 13

Graph Entropy: Relations

A “relation” is the basic structure on graph entropy.

  • Nodes ‘A’ and ‘B’ represent properties of the system and will hold an entropy.
  • Edges represent the relation between two properties and hold a conditional probability P(A|B).

A

B

P(A|B)

Relation

H3(B) = 1 + loge(H3(A) × (2 - pp))

6 of 13

Graph Entropy: Relation networks

Any acyclic directed graph where edges has assigned a probability can be decomposed as a set of “relations”.

  • Nodes without inputs are considered “leaf nodes” with a default entropy of 1.
  • The “root” node has no outgoing connections and holds the entire graph entropy.

Acyclic Directed Graph

7 of 13

Graph Entropy: Nodes

Initialization:

  • Raw entropy = H2 = 1

When a new input S arrives:

  • Update H2 = H2 × S
  • Recalculate H3
  • Output H3

H2 = Si

H3 = 1+loge(H2)

Input S1 ...

Input Sn

Output

8 of 13

Graph Entropy: Edges

Initialization:

  • P = S = ɸ = ɸ’ = 1

When P or S change:

  • ɸ is recalculated.
  • Output = (ɸ / ɸ’)
  • ɸ’ = ɸ

Input

P = P(A|B)

S = Input

Output

Node B

Node A

ɸ = S × (2-PP)

Output = ɸ/ɸ’

9 of 13

Example: Cars by engine

Cars can have a Gas engine, an Electric engine, or both if they are Hybrids.

Classically we define a partition of the space {G=gas only, E=electric only, H=hybrids} with some ‘flat’ probability distribution {0.7, 0.1, 0.2} as if they were independent and had no internal structure.

G

H=1

E

H=1

H

H=1

H2=1.88

H3=1.63

P=0.7

ɸ=1.22

1*1.22

P=0.1

ɸ=1.21

1*1.21

P=0.2

ɸ=1.28

1*1.28

10 of 13

Distribution vs Graph

0.7

0.2

0.1

H3=1.6297

G

H

E

Considering that the experiment generated a distribution is not a complete view of the information.

Using a graph would allow us to include all the internal structure and take this into account in the entropy.

G

E

G

G

E

E

0.9

0.3

1

H?

0.22

0.66

0

X

11 of 13

Cars by engine: graph

  • G* = Has gas eng. p=0.7+0.2=0.9
  • E* = Has elec. eng. p=01+0.2=0.3

Repeating over G* and E*:

  • P(G*|G*) = P(E*|E*) = 1
  • P(E*|G*) = 0.2/0.9 = 0.222
  • P(G*|E*) = 0.2/0.3 = 0.666

Adding structure made entropy to grow from HDist=1.63 to HGraph=1.77

G*|G*

R=1

H=1

P=1

ɸ=1

1*1

E*|G*

R=1

H=1

P=0.22

ɸ=1.28

1*1.28

G*

R=1.28

H=1.25

P=0.9

ɸ=1.09

1.25*1.09

G*|E*

R=1

H=1

P=0.66

ɸ=1.24

1*1.24

E*|E*

R=1

H=1

P=1

ɸ=1

1*1

E*

R=1.24

H=1.21

P=0.3

ɸ=1.30

1.21*1.30

Root

R=2.15

H=1.77

12 of 13

Separation axioms (product form)

P

Q

PxQ

Max(H3(P), H3(Q))

H3(P|Q)

P

Q

x

H3(P) x H3(Q)

H3(P&Q)

Multiplicative form of the 3th axiom

Maximum between:

H3(P)= 1+log(𝚷(2-pipi))

H3(Q)= 1+log(𝚷(2-qiqi))

Xij = P(pi|qj) = (pi×qj)

1+log(𝚷(2-XijXij))

(1+log(𝚷(2-pipi)))

×

(1+log(𝚷(2-qiqi)))

1+log(H3(Q)×𝚷(2-pipi))

=

1+log(H3(Q)×H2(P))

For P, Q independent:

13 of 13

Separation axioms (summary form)

P

Q

PxQ

Max(H3(P), H3(Q))

H3(P|Q)

P

Q

x

H3(P) x H3(Q)

H3(P&Q)

Multiplicative form of the 3th axiom

Maximum between

H3(P)=1+∑(log(2-pipi)) H3(Q)=1+∑(log(2-qiqi))

Xij = P(pi|qj) = (pi×qj)

H3(P)=1+∑(log(2-XijXij))

1+∑(log(2-pipi))

×

1+∑(log(2-qiqi))

log(H3(Q)) + H3(P)

For P, Q independent: