1 of 88

Interdisciplinary AI for Fundamental Physics

1

Mariel Pettee · October 30th, 2023

2 of 88

Unanswered questions

2

3 of 88

Unanswered questions

What is the nature of dark matter/energy?

What’s up with neutrino masses?

Why is the universe so matter-dominated?

How does gravity operate in the quantum regime?

Why is the Higgs mass so light?

Why is gravity so weak?

3

4 of 88

Unanswered questions

What is the nature of dark matter/energy?

What’s up with neutrino masses?

Why is the universe so matter-dominated?

How does gravity operate in the quantum regime?

Why is the Higgs mass so light?

Why is gravity so weak?

4

5 of 88

These questions all fall under the umbrella of fundamental physics:

the study of the core structure of the universe using multi-scale, multi-modal data.

5

6 of 88

6

How have we tended to solve problems like this?

7 of 88

Lone genius era

7

How have we tended to solve problems like this?

8 of 88

Lone genius era

Large collaboration era

8

How have we tended to solve problems like this?

9 of 88

Lone genius era

Large collaboration era

Interdisciplinary data-driven era

9

How have we tended to solve problems like this?

10 of 88

10

Data-driven?

  • The next generation of large-scale particle physics detectors won’t be able to take data for ~ a decade, but there is plenty of data soon to come from ongoing experiments.
    • (So much that we can hardly cope with it with existing analysis strategies!)
  • Now that the Standard Model is complete, new signals are likely going to emerge subtly – perhaps they already are lurking in existing data?
  • AI is flourishing, and it will shape how doing science looks and feels.

11 of 88

11

Interdisciplinary?

  • AI and data science are naturally interdisciplinary.
  • All of these fundamental questions necessarily connect the expertise of cosmologists, particle physicists, AI researchers, etc.
  • Successful ideas often come from other subdomains:
    • e.g. the Higgs mechanism & superconductivity
    • Physics is increasingly interdisciplinary. [1]
    • Interdisciplinary physics researchers are more successful. [2]

[1] Pan, R., Sinha, S., Kaski, K. et al. The evolution of interdisciplinarity in physics research. Sci Rep 2, 551 (2012). https://doi.org/10.1038/srep00551.

[2] Pluchino A, Burgio G, Rapisarda A, Biondo AE, Pulvirenti A, Ferro A, et al. (2019) Exploring the role of interdisciplinarity in physics: Success, talent and luck. PLoS ONE 14(6): e0218793. https://doi.org/10.1371/journal.pone.0218793.

12 of 88

12

Pluchino A, Burgio G, Rapisarda A, Biondo AE, Pulvirenti A, Ferro A, et al. (2019) Exploring the role of interdisciplinarity in physics: Success, talent and luck. PLoS ONE 14(6): e0218793. https://doi.org/10.1371/journal.pone.0218793.

Higher interdisciplinarity → more papers:

Higher interdisciplinarity → more citations:

13 of 88

13

Today, “interdisciplinary AI” is practically a redundant phrase.

Likewise, the folks developing these interdisciplinary toolkits in science (the “data physicists” [1]) have honed an expertise that is grounded in physics and mathematical principles, but not necessarily constrained to one subfield.

“Interdisciplinary” therefore refers to both tools and people.

[1] Opinion: The Rise of the Data Physicist. https://www.aps.org/publications/apsnews/202311/backpage.cfm

14 of 88

14

15 of 88

Interdisciplinarity makes research more efficient by sharing tools across domains.

15

16 of 88

16

17 of 88

There are clues to the Milky Way’s history woven into its populations of stars.

17

18 of 88

There are clues to the Milky Way’s history woven into its populations of stars.

18

(+ clues about where dark matter is locally distributed)

19 of 88

Stellar streams” are remnants of external groups of stars that are still in the process of being absorbed into the Galaxy.

19

20 of 88

20

21 of 88

21

Here’s the problem… streams are generally very faint and hard-to-find.

Weakly-Supervised Anomaly Detection in the Milky Way. M. Pettee, S. Thanvantri, B. Nachman, D. Shih, M. R. Buckley, J. H. Collins. arXiv:2305.03761 [astro-ph.GA]

22 of 88

22

Just 0.05% of the stars in this region of the sky 😬

Here’s the problem… streams are generally very faint and hard-to-find.

Weakly-Supervised Anomaly Detection in the Milky Way. M. Pettee, S. Thanvantri, B. Nachman, D. Shih, M. R. Buckley, J. H. Collins. arXiv:2305.03761 [astro-ph.GA]

23 of 88

CWoLa is a weakly-supervised classification technique designed, at first, for particle physics.

23

CWoLa = Classification Without Labels

arXiv:1708.02949 [hep-ph] & arXiv:1902.02634 [hep-ph]

E. Metodiev, B. Nachman, J. Thaler; J. Collins, K. Howe, B. Nachman

“CWoLa hunting” is a data-driven, model-agnostic “bump-hunting” technique designed to identify localized anomalous signals in the context of smoothly-falling backgrounds – e.g. new fundamental particles in LHC data.

It exploits the fact that an optimal “weakly-supervised” classifier (trained on noisy labels) is also an optimal fully-supervised classifier.

A model trained to classify different mixtures M1 and M2 of signal S & background B can also be used to distinguish S from B.

Key assumptions: background is indistinguishable in both mixtures, signal is localized along at least one dimension, and remaining features used for training the classifier are uncorrelated with the localized feature.

24 of 88

In short: we train a classifier to distinguish signal & sideband regions, then apply the same classifier to a test set & isolate the highest-ranked (“most anomalous”) events.

24

Scan along proper motion, as stellar streams are kinematically cold (i.e. localized in proper motion).

25 of 88

25

Scan along proper motion, as stellar streams are kinematically cold (i.e. localized in proper motion).

26 of 88

26

A method first intended to detect particle physics anomalies…

27 of 88

27

A method first intended to detect particle physics anomalies…

… is also able to find astrophysical anomalies (stellar streams)!

28 of 88

CWoLa reliably identifies simulated stellar streams with very high accuracy.

28

96% of streams have purity > 0%,

of which 69% have purity > 50%

The median purity is 86%.

Weakly-Supervised Anomaly Detection in the Milky Way. M. Pettee, S. Thanvantri, B. Nachman, D. Shih, M. R. Buckley, J. H. Collins. arXiv:2305.03761 [astro-ph.GA]

29 of 88

In Gaia data, we target the known stellar stream GD-1.

(GD-1 is especially long, narrow, and dense)

29

30 of 88

760 of 1,350 CWoLa-identified stars are labeled as belonging to GD-1.

(Purity = 56% & Completeness = 51%)

CWoLa identifies the known stellar stream GD-1 with high purity.

30

Weakly-Supervised Anomaly Detection in the Milky Way. M. Pettee, S. Thanvantri, B. Nachman, D. Shih, M. R. Buckley, J. H. Collins. arXiv:2305.03761 [astro-ph.GA]

31 of 88

Using stream-aligned coordinates, it’s clear that CWoLa has also

identified key over- and under-densities within the GD-1 stream.

31

Weakly-Supervised Anomaly Detection in the Milky Way. M. Pettee, S. Thanvantri, B. Nachman, D. Shih, M. R. Buckley, J. H. Collins. arXiv:2305.03761 [astro-ph.GA]

32 of 88

Furthermore, taking the highest-ranked subset of unlabeled GD-1 stars from CWoLa allows us to identify & investigate additional stars that might be strong candidates for membership in GD-1.

32

Weakly-Supervised Anomaly Detection in the Milky Way. M. Pettee, S. Thanvantri, B. Nachman, D. Shih, M. R. Buckley, J. H. Collins. arXiv:2305.03761 [astro-ph.GA]

33 of 88

Interdisciplinarity helps build our intuitions for creative problem-solving.

33

34 of 88

34

35 of 88

35

Beyond Imitation: Generative and Variational Choreography via Machine Learning

International Conference on Computational Creativity (2017)

arXiv:1907.05297 [cs.LG]

M. Pettee, C. Shimmin, D. Duhaime, I. Vidrin

Choreo-Graph: Learning Latent Graph Representations of the Dancing Body

NeurIPS Workshop on ML in Creativity and Design (2020)

M. Pettee, S. Miret, S. Majumdar, M. Nassar

PirouNet: Creating Dance through Artist-Centric Deep Learning

EAI ArtsIT (Best Paper Award, 2022)

arXiv:2207.12126 [cs.LG]

M. Papillon, M. Pettee, N. Miolane

36 of 88

36

37 of 88

37

38 of 88

38

38

Raw data

Fully-connected input graph

GNN Encoder

Latent graph

GNN Decoder

Output

Graph Neural Network (GNN)

Neural Relational Inference for Interacting Systems

arXiv:1802.04687 [stat.ML]

T. Kipf, E. Fetaya, K. Wang, M. Welling, R. Zemel

Choreo-Graph: Learning Latent Graph Representations of the Dancing Body. NeurIPS Workshop on ML in Creativity and Design.

M. Pettee, S. Miret, S. Majumdar, M. Nassar. (2020)

39 of 88

39

39

Choreo-Graph: Learning Latent Graph Representations of the Dancing Body. NeurIPS Workshop on ML in Creativity and Design.

M. Pettee, S. Miret, S. Majumdar, M. Nassar. (2020)

40 of 88

40

The dataset was no longer a timeseries, but each event consisted of a single pion – a discrete “body” constructed of ~50 points corresponding to energy cell deposits in our calorimeter.

Understanding the implications of graph connectivity in my dance dataset helped me understand how to choose the best graph connectivity of the pion model.

Working with different graph connectivities in a visually and physically intuitive context was useful later on when designing a GNN to identify pions at the LHC.

41 of 88

41

Point Cloud Deep Learning Methods for Pion Reconstruction in the ATLAS Experiment

ATL-PHYS-PUB-2022-040

42 of 88

42

A particularly valuable asset in working not only across STEM disciplines, but also with creative disciplines:

errors are not always bad!

43 of 88

43

44 of 88

44

45 of 88

Interdisciplinarity encourages us to think more generically & ambitiously.

45

46 of 88

What will it look like to do physics analysis in 5-10 years?

46

47 of 88

Since ~2020, we’ve increasingly seen a shift in AI industry towards foundation models.

These models are pre-trained on large and diverse data, generally using self-supervision.

They are task-agnostic, but can later be fine-tuned for task-specific purposes.

47

48 of 88

That’s great, if all of your data can be neatly combined via �a standard input format (e.g. text, 2D images).

But physics data is far more heterogeneous and multi-scale.

48

49 of 88

49

Science GPT???

50 of 88

Can we upload our data to ChatGPT?

50

51 of 88

(Definitely don’t do that.)

51

Correct answer: 14,082,120

Will only get this right 4% of the time!

# of digits

Faith and Fate: Limits of Transformers on Compositionality. arXiv:2305.18654 [cs.CL] (2023)

52 of 88

As it happens, language models really struggle to understand what makes numbers different from other kinds of text.

52

53 of 88

53

[1] Do NLPs Know Numbers? Probing Numeracy in Embeddings. https://aclanthology.org/D19-1534.pdf

[2] NumGPT: Improving Numeracy Ability of Generative Pre-trained Models. arXiv:2109.03137 [cs.CL].

Existing embeddings can’t generalize out-of-distribution…

Fixed Word

Embeddings

Contextual Word

Embeddings

54 of 88

54

[1] Do NLPs Know Numbers? Probing Numeracy in Embeddings. https://aclanthology.org/D19-1534.pdf

[2] NumGPT: Improving Numeracy Ability of Generative Pre-trained Models. arXiv:2109.03137 [cs.CL].

Existing embeddings can’t generalize out-of-distribution…

…and they behave erratically.

Fixed Word

Embeddings

Contextual Word

Embeddings

55 of 88

How can we combine the flexibility of a Large Language Model with a better inductive bias about how numbers operate?

55

56 of 88

56

Dedicated numerical encodings see trade-offs between accuracy, range, and vocabulary size.

F. Charton. Linear Algebra with Transformers. arXiv:2112.01898 [cs.LG].

57 of 88

57

Using scientific notation, e.g. writing numbers as e.g. 832 → “8 10e2 3 10e1 2 10e0”,

is somewhat helpful, but the model still isn’t learning the basic rules of arithmetic.

Investigating the Limitations of Transformers with Simple Arithmetic Tasks. arXiv:2102.13019 [cs.CL].

58 of 88

58

Another popular (if somewhat contrived) strategy:

mapping numbers onto a basis of “prototype numerals”.

NumGPT: Improving Numeracy Ability of Generative Pre-trained Models. arXiv:2109.03137 [cs.CL].

Embed the exponent as a vector associated with integers between -8 and +12

Embed the mantissa as a sum of distances from “prototype numerals” distributed uniformly between [-10, 10]:

59 of 88

“We” = the recently-announced Polymathic AI Collaboration

polymathic-ai.org

59

60 of 88

“We” = the recently-announced Polymathic AI Collaboration

polymathic-ai.org

60

61 of 88

We propose a new numerical encoding scheme that uses just a single token and renders a language model end-to-end continuous.

61

S. Golkar, M. Pettee, M. Eickenberg, A. Bietti, M. Cranmer, G. Krawezik, F. Lanusse, M. McCabe, R. Ohana, L. Parker, B. Régaldo-Saint Blancard, T. Tesileanu, K. Cho, S. Ho.

xVal: A Continuous Number Encoding for Large Language Models. arXiv:2310.02989 [stat.ML].

Blog post: https://polymathic-ai.org/blog/xval/

62 of 88

62

Text-based encoding:

x = [ The electron has a mass of + 511 e-3 MeV ]

xtext = [ 6 9482 350 3 1584 20 82 7295 105 8001 ]

“The electron has a mass of 0.511 MeV.”

xVal encoding:

x = [ The electron has a mass of [NUM] MeV ]

xtext = [ 6 9482 350 3 1584 20 1 8001 ]

63 of 88

63

During encoding, the actual numerical value is multiplied element-wise by the [NUM] token embedding vector.

64 of 88

64

During inference, upon encountering a [NUM] token, the model prompts a dedicated number head to extract its value.

65 of 88

65

This encoding strategy has 3 main benefits:

  • Continuity
    • It embeds key information about how numbers continuously relate to one another, making its predictions more appropriate for many scientific applications.
  • Interpolation
    • It makes better out-of-distribution predictions than other numerical encodings.
  • Efficiency
    • By using just a single token to represent any number, it requires less memory, compute resources, and training time to achieve strong results.

66 of 88

66

When tested on the task of temperature forecasting from a real-world dataset, xVal achieves the lowest loss with the shortest training time.

Plus, xVal’s predictions aren’t prone to over-predicting numbers that are over-represented in the training set (horizontal stripe artifacts):

67 of 88

67

When tested on the task of extracting orbital data from simulated planetary motion, xVal shows improved out-of-distribution predictions.

68 of 88

68

When tested on the task of extracting orbital data from simulated planetary motion, xVal shows improved out-of-distribution predictions.

69 of 88

We shouldn’t expect LLMs to be perfect calculators,

and we have quite a ways to go until we build a “ScienceGPT”-like foundation model for physics.

But a numerical encoding like xVal could help adapt some of the key structures of LLMs to make them

more appropriate for scientific analysis.

69

70 of 88

Some quick conclusions

70

71 of 88

I predict (hope?) that the next 5-10 years of fundamental physics analysis will be even more curiosity-driven, data-driven, and foundation model-driven, more social, possibly less pretentious, and more interdisciplinary.

71

72 of 88

While funding agencies often claim to want interdisciplinary researchers, in practice, existing reward structures in academia still largely prioritize work in existing disciplines.

This can and should change.

72

73 of 88

Staying at the cutting edge of AI for science requires deep disciplinary knowledge combined with a broad curiosity about how it is (or isn’t) being applied successfully in other disciplines.

(if not, we risk forever adapting AI developments for physics, instead of innovating ourselves)

73

74 of 88

Staying at the cutting edge of AI for science requires deep disciplinary knowledge combined with a broad curiosity about how it is (or isn’t) being applied successfully in other disciplines.

(if not, we risk forever adapting AI developments for physics, instead of innovating ourselves)

I believe that most people’s curiosity doesn’t stop at arbitrary disciplinary borders. It takes some bravery to stray from these predefined boundaries, but I hope this talk might encourage a few of you to continue to interrogate just how broad your work can be.

74

75 of 88

Thanks!

75

76 of 88

76

Nearly 100 stellar streams have been identified so far in the Milky Way – the more, the better!

77 of 88

77

Gaps & wiggles in a stream could be evidence of interactions with local clumps of dark matter.

78 of 88

We take advantage of the following key theorem: �

An optimal classifier trained to distinguish Mixture #1 from Mixture #2� is also optimal for distinguishing signal events from background events.

78

Likelihood ratio for classifying mixtures:

Mixture likelihoods as a function of sig/bkg likelihoods:

Rewrite likelihood ratio:

…monotonically increasing rescaling of LR, if f1 > f2 !

79 of 88

Our stellar data comes from the Gaia satellite.

79

We use 21 “patches” of the sky, as defined by D. Shih, M. R. Buckley, L. Necib, and J. Tamanas (2021):

Via Machinae: Searching for Stellar Streams using Unsupervised Machine Learning. arXiv:2104.12789 [astro-ph.GA]

These patches are subsets of Gaia Data Release 2, and contain ~8 million stars in total.

Publicly available at: doi.org/10.5281/zenodo.7897936

80 of 88

Using stream-aligned coordinates, it’s clear that CWoLa has also

identified key over- and under-densities within the GD-1 stream.

80

Price-Whelan and Bonaca (2018)

Pettee et. al. (2023)

De Boer, Erkal, and Gieles (2020)

81 of 88

We see clear evidence for the “spur” and “blob”.

81

Price-Whelan and Bonaca (2018)

Pettee et. al. (2023)

82 of 88

We also see evidence of a “cocoon” of stars surrounding the core of the stream.

82

Malhan et. al. (2019)

Pettee et. al. (2023)

83 of 88

Variations on Existing Data

83

83

Add a small amount of noise to a sequence’s latent embedding & decode that new point

Amount of variation

Black = Real

Blue = Generated

Variational Autoencoder (VAE)

84 of 88

Qualitative labels for conditional generation

84

84

Variational Autoencoder (VAE)

85 of 88

85

(...but even fine-tuning GPT-3 with multiplication examples doesn’t help it generalize.)

June 1, 2023

Fine-tuned GPT-3 on 2 million multiplication pairs.

Blue = in-distribution

Red = OOD

86 of 88

86

87 of 88

87

88 of 88

88