1 of 70

Inferential Engines

Alex Alemi

Google Research

@ Aspen 2023

2 of 70

By Phanatic - Parthenon, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=82609062

3 of 70

Digitized from NPR - Two Centuries of Energy in America in Four Graphs https://www.npr.org/sections/money/2013/04/10/176801719/two-centuries-of-energy-in-america-in-four-graphs

4 of 70

Digitized from NPR - Two Centuries of Energy in America in Four Graphs https://www.npr.org/sections/money/2013/04/10/176801719/two-centuries-of-energy-in-america-in-four-graphs

Wood

Fossil Fuels

5 of 70

By Amcyrus2012 - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=37728056

Among the materials that are dug because they are useful, those known as coals are made of earth, and, once set on fire, they burn like charcoal. They are found in Liguria... and in Elis as one approaches Olympia by the mountain road; and they are used by those who work in metals. - On stones, Theophrastus (c. 371 - 287 BC)

6 of 70

Digitized from NPR - Two Centuries of Energy in America in Four Graphs https://www.npr.org/sections/money/2013/04/10/176801719/two-centuries-of-energy-in-america-in-four-graphs

1712: Newcomen's atmospheric engine

1774: James Watt

1804: Trevithick

7 of 70

8 of 70

Digitized from NPR - Two Centuries of Energy in America in Four Graphs https://www.npr.org/sections/money/2013/04/10/176801719/two-centuries-of-energy-in-america-in-four-graphs

1712: Newcomen's atmospheric engine

1774: James Watt

1804: Trevithick

1824: Carnot

1892: Diesel

9 of 70

50x

Digitized from NPR - Two Centuries of Energy in America in Four Graphs https://www.npr.org/sections/money/2013/04/10/176801719/two-centuries-of-energy-in-america-in-four-graphs

Heating

Work

~80 W

~4 kW

Waste Heat

10 of 70

By Ricardo Liberato - All Gizah Pyramids, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=2258048

11 of 70

Training Compute-Optimal Large Language Models. Hoffmann et al. [2203.15556]

12 of 70

What are we missing?

13 of 70

Scaling Laws

  • Scaling Laws for Neural Language Models. Kaplan et al. [2001.08361]
  • A Solvable Model of Neural Scaling Laws. Maloney et al. [2210.16859]
  • Learning Curve Theory. Hutter [2102.04074]
  • Explaining Neural Scaling Laws. Bahri et al. [2102.06701]
  • An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws. Jeon & Roy [2212.01365]
  • Training Compute-Optimal Large Language Models. Hoffmann et al. [2203.15556]
  • Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer Yang et al. [2203.03466]

14 of 70

Stochastic Thermodynamics

Jarzynski Equality

Crooks Fluctuation Theorem

Generalized Landauer Bounds

Thermodynamic Uncertainty Relations

Thermodynamic Speed Limits

Information Engines

Generalized Szilard Engines

e.g. Udo Seifert, David Wolpert, Susanne Still, David Sivak, Chris Jarzynski, Gavin Crooks, ...

15 of 70

It's

Diffusion

It's VB

It's the Generalized Landauer Bound

It's Non-Eq Free Energy

It's ELBO

It's IB

Modified from https://www.science.org/doi/10.1126/scitranslmed.3002981

16 of 70

17 of 70

KL is Nonnegative

18 of 70

KL is Monotonic

19 of 70

Variational Autoencoders (VAEs)

20 of 70

Generative Process (Reverse)

21 of 70

Representational Process (Forward)

22 of 70

23 of 70

ELBO

Distortion

Rate

Negative Log Likelihood

Data Entropy

Fixing a Broken ELBO - Alemi et al. [1711.00464]

24 of 70

Generalized Landauer Bound

Entropy Change

Entropy Flow

Entropy Production

=

+

Ensemble and Trajectory Thermodynamics: A Brief Introduction - Van den Broeck & Esposito - [1403.1777]

25 of 70

Where are the Heat Baths?

Where is the Microscopic Reversibility?

26 of 70

Gaussians are Affine Flows

27 of 70

VAEs

28 of 70

VAEs are Conditional Bijective Flows

29 of 70

VAEs are Conditional Bijective Flows

30 of 70

VAEs are Conditional Bijective Flows

31 of 70

VAEs are Conditional Bijective Flows

32 of 70

VAEs are Conditional Bijective Flows

33 of 70

VAEs are Conditional Bijective Flows

34 of 70

VAEs are Conditional Bijective Flows

35 of 70

VAEs are Conditional Bijective Flows

36 of 70

VAEs are Conditional Bijective Flows

37 of 70

VAEs are Conditional Bijective Flows

38 of 70

VAEs are Conditional Bijective Flows

39 of 70

VAEs are Conditional Bijective Flows

40 of 70

VAEs are Conditional Bijective Flows

41 of 70

VAEs are Conditional Bijective Flows

42 of 70

Forward

43 of 70

Forward

3900

200

1100

210

44 of 70

Reverse

45 of 70

Reverse

3900

200

1100

210

46 of 70

Reconstruction | Lossy Compression

47 of 70

Reconstruction | Lossy Compression

210

3900

48 of 70

Lossless Compression

49 of 70

Lossless Compression

750

210

3900

50 of 70

Lossless Compression

750

210

3900

190

51 of 70

Bits-Back

52 of 70

Bits-Back

750

210

190

3900

53 of 70

Bits-Back w/Letters

54 of 70

Bits-Back w/Letters

880

210

190

3900

55 of 70

Bits-Back w/Noise

56 of 70

Bits-Back w/Noise

5000

230

190

1100

Universal Compression is Impossible

57 of 70

Finite Size Bath

Warning: Flashing Video!

58 of 70

Finite Heat Bath

59 of 70

Finite Size Bath

t=0

60 of 70

Finite Size Bath

t=3k

61 of 70

Finite Size Bath

t=6k

62 of 70

Finite Size Bath

t=9k

63 of 70

Finite Size Bath

t=12k

64 of 70

Finite Size Bath

t=15k

65 of 70

It's TURs

It's VB

It's the Generalized Landauer Bound

It's Non-Eq Free Energy

It's ELBO

It's IB

66 of 70

FIN

(THE END)

67 of 70

Jarzynski Equality

68 of 70

Jarzynski Equality

69 of 70

Thermodynamic Uncertainty Relations

70 of 70

Speed Limits