2 of 13

Entropy issue #1: Cross-entropy

Gibbs cross-entropy for two distributions P, Q is not well defined: whenever q_i is zero and p_i is not, H is not defined, as log(0) makes no sense.

We present generalised forms of cross-entropy without this limitation.

H₃(Q,P) = 1 + log_e(∏(2-q_i^pi))

H_Gibbs(Q,P) = -∑(p_i×log(q_i))

H₁(Q,P) = ∑(1-q_i^pi)

H₂(Q,P) = ∏(2-q_i^pi)

Product-based forms of cross-entropy

3 of 13

Formulations for H₃ cross-entropy

H₃(P) entropy and cross-entropy can be defined using several interesting formulations.�

H₃(Q,P) = ∑(log_e(e×(2-q_i^pi)))

H₃(Q, P) = 1 + log_e(H₂(Q, P))

H₃(Q,P) = log_e(e×∏(2-q_i^pi))

H₃(Q, P) = log_e(e×H₂(Q, P))

H₃(Q,P) = 1 + ∑(log_e(2-q_i^pi))

4 of 13

Entropy issue #2: Distributions only

Gibbs entropy can only be used on probability distributions:

One-level flat structure.
∑ p_i = 1 ({p_i} represent a partition).

Instead, we will be defining entropy as:

Distribution

p₁

p₂

p₃

𝚷

H₃(P) = 1+log_e(∏(2-p_i^pi))

5 of 13

Graph Entropy: Relations

A “relation” is the basic structure on graph entropy.

Nodes ‘A’ and ‘B’ represent properties of the system and will hold an entropy.
Edges represent the relation between two properties and hold a conditional probability P(A|B).

P(A|B)

Relation

H₃(B) = 1 + log_e(H₃(A) × (2 - p^p))

6 of 13

Graph Entropy: Relation networks

Any acyclic directed graph where edges has assigned a probability can be decomposed as a set of “relations”.

Nodes without inputs are considered “leaf nodes” with a default entropy of 1.
The “root” node has no outgoing connections and holds the entire graph entropy.

Acyclic Directed Graph

7 of 13

Graph Entropy: Nodes

Initialization:

Raw entropy = H₂ = 1

When a new input S arrives:

Update H₂ = H₂ × S
Recalculate H₃
Output H₃

H₂ = ∏ S_i

H₃ = 1+log_e(H₂)

Input S₁ ...

Input S_n

Output

8 of 13

Graph Entropy: Edges

Initialization:

P = S = ɸ = ɸ’ = 1

When P or S change:

ɸ is recalculated.
Output = (ɸ / ɸ’)
ɸ’ = ɸ

Input

P = P(A|B)

S = Input

Output

Node B

Node A

ɸ = S × (2-P^P)

Output = ɸ/ɸ’

9 of 13

Example: Cars by engine

Cars can have a Gas engine, an Electric engine, or both if they are Hybrids.

Classically we define a partition of the space {G=gas only, E=electric only, H=hybrids} with some ‘flat’ probability distribution {0.7, 0.1, 0.2} as if they were independent and had no internal structure.

“G”

H=1

“E”

H=1

“H”

H=1

H₂=1.88

H₃=1.63

P=0.7

ɸ=1.22

1*1.22

P=0.1

ɸ=1.21

1*1.21

P=0.2

ɸ=1.28

1*1.28

10 of 13

Distribution vs Graph

0.7

0.2

0.1

H₃=1.6297

Considering that the experiment generated a distribution is not a complete view of the information.

Using a graph would allow us to include all the internal structure and take this into account in the entropy.

0.9

0.3

0.22

0.66

11 of 13

Cars by engine: graph

G* = Has gas eng. p=0.7+0.2=0.9
E* = Has elec. eng. p=01+0.2=0.3

Repeating over G* and E*:

P(G*|G*) = P(E*|E*) = 1
P(E*|G*) = 0.2/0.9 = 0.222
P(G*|E*) = 0.2/0.3 = 0.666

Adding structure made entropy to grow from H_Dist=1.63 to H_Graph=1.77

G*|G*

R=1

H=1

P=1

ɸ=1

1*1

E*|G*

R=1

H=1

P=0.22

ɸ=1.28

1*1.28

R=1.28

H=1.25

P=0.9

ɸ=1.09

1.25*1.09

G*|E*

R=1

H=1

P=0.66

ɸ=1.24

1*1.24

E*|E*

R=1

H=1

P=1

ɸ=1

1*1

R=1.24

H=1.21

P=0.3

ɸ=1.30

1.21*1.30

Root

R=2.15

H=1.77

12 of 13

Separation axioms (product form)

PxQ

Max(H₃(P), H₃(Q))

H₃(P|Q)

H₃(P) x H₃(Q)

H₃(P&Q)

≤

Multiplicative form of the 3^th axiom

Maximum between:

H₃(P)= 1+log(𝚷(2-p_i^pi))

H₃(Q)= 1+log(𝚷(2-q_i^qi))

X_ij = P(p_i|q_j) = (p_i×q_j)

1+log(𝚷(2-X_ij^Xij))

(1+log(𝚷(2-p_i^pi)))

(1+log(𝚷(2-q_i^qi)))

1+log(H₃(Q)×𝚷(2-p_i^pi))

1+log(H₃(Q)×H₂(P))

For P, Q independent:

13 of 13

Separation axioms (summary form)

PxQ

Max(H₃(P), H₃(Q))

H₃(P|Q)

H₃(P) x H₃(Q)

H₃(P&Q)

≤

Multiplicative form of the 3^th axiom

Maximum between

H₃(P)=1+∑(log(2-p_i^pi)) H₃(Q)=1+∑(log(2-q_i^qi))

X_ij = P(p_i|q_j) = (p_i×q_j)

H₃(P)=1+∑(log(2-X_ij^Xij))

1+∑(log(2-p_i^pi))

1+∑(log(2-q_i^qi))

log(H₃(Q)) + H₃(P)

For P, Q independent: