1 of 208

NeurIPS 2023

Trends in AI

Nikolaos Vasiloglou

VP of Research-ML

1

2 of 208

A few words

This review was based on:

Tutorials and Workshops
Keynotes and best papers

Did not go through the total of ~7.5K papers
Tried to dig into references from the Tutorials and Workshops that come from other conferences as well
Compiled a comprehensive list of links to papers and talks for further reading
While papers are free, talks will become free for the public soon (You can always “buy” a registration and view them)
By attending one of the three ICLR/ICML/NeurIPS you get a good view of AI in the current year

2

3 of 208

In a nutshell (1)

LLMs are everywhere but not for everyone yet
Democratization of LLMs

Free models (LLAMA 2)

The weights are public but not the training code

Open Source Models (LLM360)

Training data, code, checkpoints are public
Fully reproducible

Open Infrastructure (CollosalAI)

All the distribution infrastructure you need to train a huge model

Full stack language (Mojo)

Unifying the software layers for better LLM coding

Composable LLMS

Modularizing LLMs by combining smaller ones

3

4 of 208

In a nutshell (2)

LLMs for Math Reasoning

The return of theorem provers and their applications on other domains
Theorem Provers might be a great companion for LLMs

Are we running out of data for LLMs?

The unexpected value of simulators

Relational Tables and Language Models

Predictive operations with LLMs

Hopfield Networks are back

Creating hope for better theoretical understanding of Attention

There was no tutorial/workshop about GNNs, only 42 papers/4000 in the main conference. Are transformers winning over GNNs?

4

5 of 208

AI futurism

5

6 of 208

Other NeurIPS reviews

6

https://www.latent.space/p/neurips-2023-papers

6

7 of 208

Understanding LLMs

7

8 of 208

Optimizing attention

Systems for Foundation Models, and Foundation Models for Systems. [pptx] Christopher Ré

8

9 of 208

Test of time (Word2vec)

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov* · Ilya Sutskever* · Kai Chen · Greg Corrado · Jeff Dean

9

*Absent in the ceremony

9

10 of 208

What we have learned so far

The birth of self-supervision
Word2vec -> Elmo -> Transformer (BERT)
Context free embeddings -> Context Aware Embeddings
The return of Asynchronous Training

10

11 of 208

An ontology of word embeddings (Before GPT)

11

https://spotintelligence.com/2023/12/26/embeddings-from-language-models-elmo/

Something is missing

11

12 of 208

The revolution of context

12

https://spotintelligence.com/2023/12/26/embeddings-from-language-models-elmo/

12

13 of 208

The Language Model is the Embedding

In word2vec the word-vector is the embedding
Semantic Reasoning with Vector arithmetic

10 years later

“Efficiently Tuned Parameters are Task Embedding,

“Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models,

“TASK2VEC: Task Embedding for Meta-Learning”

13

Jump here if you want more!

13

14 of 208

The unreasonable behavior of LLMs

What is going on as we scale from thousand to million to billion to trillion parameters

14

15 of 208

Do LLMs have emergent properties?

Are Emergent Abilities of Large Language Models a Mirage?

15

16 of 208

Are we running

out of oil?

Sustainability analysis of data

16

17 of 208

Managing data repetitions and parameter budget

Scaling Data-Constrained Language Models (Outstanding Runner up) [paper]
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
High Quality data (Books and papers 1TB/16TB) will be exhausted by 2024
Code data 14TB (unclear on the usefulness)
Repeating data gives you a small edge
The best thing to do is replace some of the data with code and then repeat

17

18 of 208

Training data trend

18

19 of 208

Training LLMs on data budget

19

20 of 208

LLMs as a World Model

Tutorial Language Models Meet World Models [slides]

20

21 of 208

The rise of simulators

21

22 of 208

Train LLMs with Simulation data!

22

23 of 208

LLMs not good enough for Common Sense

23

24 of 208

Literature

24

On the Planning Abilities of Large Language Models : A Critical Investigation

Reasoning with Language Model is Planning with World Model

24

25 of 208

LLM not good enough for Social Tasks

25

26 of 208

Literature

26

27 of 208

Literature

27

28 of 208

A deep dive into the physics simulators

Recent Advances

28

29 of 208

MuJoCo: A physics engine for model-based control

Multi-Joint dynamics with Contact

one order of magnitude is due to faster computation,
one order of magnitude is due to parallel processing which fully utilizes all available processors
one order of magnitude is due to higher accuracy and stability allowing larger time-steps

29

30 of 208

iGibson 2.0

Object-Centric Simulation for Robot Learning of Everyday Household Tasks

supports object states,

temperature, wetness level, cleanliness level, and toggled and sliced states,

implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked.

given a logic state, iGibson 2.0 can sample valid physical states that satisfy it.
can generate potentially infinite instances of tasks with minimal effort

includes a virtual reality (VR) interface to immerse humans in its scenes to collect demonstrations.

30

31 of 208

iGibson 2.0

31

32 of 208

Habitat 2.0

Training Home Assistants to Rearrange their Habitat

ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments with articulated objects (e.g. cabinets and drawers that can open/close)
H2.0: a high-performance physics-enabled 3D simulator with speeds, representing 100× speed-ups over prior work
Home Assistant Benchmark (HAB): a suite of common tasks for assistive robots (tidy the house, prepare groceries, set the table)

32

33 of 208

AI2-THOR

An Interactive 3D Environment for Visual AI

The House Of inteRactions (THOR)
Near photo-realistic 3D indoor scenes
AI agents can navigate in the scenes and interact with objects
Enables research in many different domains

Deep reinforcement learning
imitation learning, learning by interaction
Planning,
visual question answering
Learning models of cognition.

33

34 of 208

AI2-THOR

34

35 of 208

A big library

35

36 of 208

Habitat 2.0

36

37 of 208

ThreeDWorld

A Platform for Interactive Multi-Modal Physical Simulation

high-fidelity sensory data
Physical interactions between mobile agents and objects in rich 3D environments.
real-time near-photorealistic image rendering; generative procedures
high-fidelity audio rendering;
realistic physical interactions for a variety of material types, including cloths, liquid, and deformable objects
customizable “agents” that embody AI agents;
support for human interactions with VR devices. TDW’s API enables multiple

37

38 of 208

ThreeDWorld

38

39 of 208

ScenseScript

39

https://ai.meta.com/blog/scenescript-3d-scene-reconstruction-reality-labs-research/

39

40 of 208

Imitating Shortest Paths in Simulation

Effective Navigation and Manipulation in the Real World

40

41 of 208

LEARNING RIGID DYNAMICS WITH FACE INTERACTION GRAPH NETWORKS

Simulating rigid collisions among arbitrary shapes is difficult

complex geometry
strong nonlinearity of the interactions

(GNN)-based models learn to simulate complex physical dynamics

fluids
cloth
articulated bodies

“Face Interaction Graph Network” (FIGNet) extends beyond GNN-based methods,

41

42 of 208

Modeling faces not just nodes

42

43 of 208

Graph Networks as Learnable Physics Engines

43

44 of 208

Follow Kelsey Allen

44

https://k-r-allen.github.io/

44

45 of 208

Follow Alvaro Sanchez Gonzalez

45

46 of 208

Unisim: Probably the LLM equivalent for simulators

Learning Interactive Real-World Simulators

What is the difference with Sora?

Unisim is unified action-in-video-out generative framework
Combining diverse datasets rich in along different dimensions e.g., objects, scenes, actions, motions
Formulate the action-in-video-out framework as an observation prediction model conditioned on finite history and parametrized by a video diffusion model
Enable both high-level language policies, low-level control policies, and video captioning models to generalize to the real world when trained purely in simulation, thereby bridging the sim-to-real gap

46

47 of 208

Unisim

47

48 of 208

Other resources for Physics Simulation @NeurIPS 2023

48

49 of 208

Climate simulations with AI

49

50 of 208

A taxonomy of simulation research papers

50

Molecular/DNA/Proteins

Newtonian based

Rigid body

particles

Astronomical Scale

Human scale

Thermal physics

GNN based

Neural ODE/PDE

Generative based

Logic Rules

Videos Images

Quantum particles

Fluid Dynamics

Other simulations

50

51 of 208

Other NeurIPS 2023 resources relevant to physics

AI for Science: from Theory to Practice

51

52 of 208

Other NeurIPS 2023 resources relevant to physics

Machine Learning and the Physical Sciences

52

53 of 208

19 Main track + 11 Benchmark papers (simulation)

Before 2023

53

54 of 208

Nick’s favorite Simulation framework

54

https://ai4abm.org/

54

55 of 208

Back to Reasoning with simulators

55

56 of 208

Probabilistic Programs as the language of simulation

56

57 of 208

Detailed info

57

[2306.12672] From Word Models to World Models

57

58 of 208

Language models as goal/reward

58

59 of 208

Guiding simulators with a structured language

59

Language to Rewards for Robotic Skill Synthesis

59

60 of 208

Limitations

60

61 of 208

Multi Modal Theory of Mind

61

MMToM-QA: Multimodal Theory of Mind Question Answering

61

62 of 208

Language Models for social reasoning

62

63 of 208

Games as simulators

63

Voyager: An Open-Ended Embodied Agent with Large Language Models

63

64 of 208

64

65 of 208

65

66 of 208

66

67 of 208

67

68 of 208

Social Simulator

68

Training Socially Aligned Language Models on Simulated Social Interactions

68

69 of 208

Imagining and Verbalizing

69

DreamLLM: Synergistic Multimodal Comprehension and Creation

69

70 of 208

70

71 of 208

An example

71

72 of 208

Takeaways

72

73 of 208

Distributed LLMs

Composing Big models from smaller ones

73

74 of 208

The LLM as a giant vector

74

75 of 208

The LLM as a giant vector

75

76 of 208

Would you ever write your product as a huge C++ file?

The Software 2.0 paradigm refers to small models (a single cpp file)
We need to be able to “compile” each model independently
Finally build a big model by combining them
We also need to be able to track changes

Let’s see what the software 3.0 might look like

76

77 of 208

Software 3.0

Data curation (1.0: Writing the code, vscode)
Small Language Model training (1.0: Compilation, gcc )
Libraries of Language Models (1.0: Library making gcc -fpic)
Linking Language Models for a specific task (1.0: Linking gnu ld)
Tracking changes (1.0: git)

77

78 of 208

Building a bigger Language Model

78

79 of 208

Distribution of Heterogeneous LLM

79

80 of 208

Collaborating on LLM training (Git)

80

81 of 208

A taxonomy of LLM merging

81

82 of 208

Incremental maintenance of LLMs

82

83 of 208

Building an LLM from scratch

83

84 of 208

Democratizing LLM building

Are the capabilities of LLMs bound by data? (see Beyond Scaling Panel)
Attributing Model Behavior at Scale (ATTRIB)
Contributing to an Efficient and Democratized Large Model Era [slides]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

84

85 of 208

All the necessary plumbing for building LLMs

Infrastructure is not easy

85

86 of 208

Hardware cannot keep up with model growth

86

87 of 208

Startups will be able to afford building GPT-3 in 2 years

87

88 of 208

Colossal-AI contribution

88

89 of 208

It takes a village to build one of them

89

burns

89

90 of 208

Colossal-AI offers a framework for building a LLM

90

91 of 208

Looks like there is a lot of demand!

91

92 of 208

Congratulations!

You build your first really Large LM

Can you tune, maintain, grow it without redoing everything from scratch?

92

93 of 208

Attribution as a dimension for optimizing LLM costs

93

94 of 208

Pillar I: Data attribution

94

95 of 208

Pillar II: Model Attribution

95

96 of 208

Pillar III: Algorithm Attribution

96

97 of 208

DataInf

97

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

97

98 of 208

Tiny LMs might be the solution

98

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Faster training/tuning cycles
Easier to do ablation studies on architecture and data contribution
You can tune an XGB model because it takes an hour to train it
If you can train a tiny LLM in hours then you can tune it too
Like XGB feature engineering the art is in crafting a good dataset

98

99 of 208

Mathematics/Reasoning and LLMs

99

100 of 208

The most exciting topic

How far are we from deepmath?
Machine Reading of the math web with LLMs
Can we follow the same paradigm for common and other reasoning?
Revisiting the Lean prover

100

101 of 208

Mathematical Reasoning

101

102 of 208

Deepmind’s recent buzz

102

103 of 208

What is formal theory proving?

103

104 of 208

Code vs. math symbols

104

105 of 208

LLMs are doing ok on high school level

105

106 of 208

LLema vs. Minerva

106

107 of 208

Informal vs. Formal Mathematical Reasoning

107

108 of 208

Checking Mathematical Proofs is Hard for Humans

108

109 of 208

Proof Assistants (Interactive Theorem Provers)

109

110 of 208

Examples of Proof Assistants

110

111 of 208

Generating Proof Steps (Tactics)

111

112 of 208

Searching for Proofs

112

113 of 208

Best First Search

113

114 of 208

Hyper Tree Proof Search

114

115 of 208

Is Proof Search Really Necessary?

115

116 of 208

Premise Selection

116

117 of 208

Magnushammer

117

118 of 208

Reprover: Retrieval-Augmented Prover

118

119 of 208

Premise Retrieval Improves Theorem Proving

119

120 of 208

LeanDojo

120

121 of 208

From Informal to Formal Proofs

121

https://www.newscientist.com/article/2422601-mathematicians-plan-computer-proof-of-fermats-last-theorem/

121

122 of 208

The ecosystem

122

123 of 208

The process

123

124 of 208

An example

124

125 of 208

Theorem provers for code verification

With the help of LLMs

125

126 of 208

How can we verify code produced by LLMs?

126

127 of 208

Theorem Proving for Verified Code Generation

127

128 of 208

Formal Software Verification

128

129 of 208

Software verification on the wild

129

130 of 208

Current tools

130

131 of 208

Proof Synthesis SoTA

131

132 of 208

Proofster

132

133 of 208

The maintenance problem

133

134 of 208

How about an LLM?

134

135 of 208

The Clover Paradigm

135

136 of 208

Theorem Proving and LLMs: Takeaways

136

137 of 208

MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

137

138 of 208

138

139 of 208

Generating Reasoning data with LLMs for finetuning LLMs

139

140 of 208

Analogical prompting

140

141 of 208

A math example (1)

141

142 of 208

A math example (2)

142

143 of 208

Benchmarks

143

144 of 208

More Benchmarks

144

145 of 208

Discussion: abstraction is key in analogical reasoning

145

146 of 208

Takeaways

146

147 of 208

Literature

147

148 of 208

The trend continues

148

SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures

148

149 of 208

A programming Language for Transformers RASP-L

149

150 of 208

The original Language RASP

150

https://github.com/yashbonde/rasp

150

151 of 208

RASP-L extension

151

152 of 208

The paradox of Learning to reason from data

152

153 of 208

What can BERT learn?

153

154 of 208

BERT prefers Statistical to Logical Thinking

154

155 of 208

How should we interpret this phenomenon?

155

Are we using the wrong paradigm for training our models?
Is it impossible for logic to emerge from statistics?
What is the architecture that creates logic from statistical observations? (apart from the human brain)
Is that the definition of AGI?

155

156 of 208

Injecting Logic to GenAI models

156

157 of 208

More examples

157

158 of 208

Literature

158

159 of 208

Injecting Logic in the transformer

159

160 of 208

Symbol processing required for implication rule

160

161 of 208

Symbol processing required for implication rule

161

162 of 208

Symbol processing required for implication rule

162

163 of 208

Transformer Production Framework

163

164 of 208

Symbol processing required for implication rule

164

165 of 208

What we have learned

165

166 of 208

Biases for a new generation of deep-reasoning LMs

166

167 of 208

LLMs for Tabular Data

167

168 of 208

Table representation Learning Workshop

Invited talk: Advances in In-Context Learning for Tabular Datasets

Invited talk: Next-Generation Data Management with Large Language Models

168

169 of 208

Tables are everywhere

169

170 of 208

Goals of the TRL workshop

170

171 of 208

Last year’s best paper

171

172 of 208

Very Interesting Results

172

173 of 208

TabPFN 1 year ago

173

174 of 208

TabPFN 2.0

174

175 of 208

GPT-4 as Data Science Assistant

175

176 of 208

An example

176

177 of 208

Benchmarks

177

178 of 208

Feature Engineering

178

179 of 208

Takeaways from CAAFE

179

180 of 208

Can LLMs learn how to do Gradient Descent?

180

181 of 208

In-Context Learning

181

182 of 208

A surprising Experiment

182

183 of 208

Literature

183

184 of 208

Prior-Fitted Networks (PFNs) Visualized

184

185 of 208

Quantitative result (87 numerical datasets without missing values)

185

186 of 208

Performance with many objectives

186

187 of 208

MotherNet

187

188 of 208

Generate a model instead of predictions

188

189 of 208

Architecture

189

190 of 208

Literature

190

191 of 208

Conclusions

191

192 of 208

Takeaways on TabPFN

192

193 of 208

The LLM version of Kumo

193

194 of 208

Another approach to the same problem

194

195 of 208

Text to SQL

What we have learned so far

195

196 of 208

Lessons learned from Natural Language to SQL

196

197 of 208

How Documentation improves GPT’s Text-to-SQL

197

198 of 208

Another Text-To-SQL paper

198

199 of 208

Benchmarks

199

200 of 208

New frontiers in Graph Learning

200

201 of 208

Foundational Models for graphs and relational data

201

202 of 208

The new Web

202

203 of 208

RAG to the rescue

203

204 of 208

Application Development with LLMs

204

205 of 208

Elementary and advanced

205

206 of 208

Wrapping up

206

207 of 208

Where do we go next?

We celebrated the 10 year birthday of LLMs
We celebrated the one year birthday of ChatGPT
Yet OpenAI was not present at NeurIPS (Why?)
Research community clearly sides with open source LLMs
The two main challenges

More good data
Cheaper Models

Cheaper hardware?
Less Parameters?
More Efficient algorithms?

Will AI push math to its limits?

207

208 of 208

208