1 of 208

NeurIPS 2023

Trends in AI

Nikolaos Vasiloglou

VP of Research-ML

1

1

2 of 208

A few words

  • This review was based on:
    • Tutorials and Workshops
    • Keynotes and best papers
  • Did not go through the total of ~7.5K papers
  • Tried to dig into references from the Tutorials and Workshops that come from other conferences as well
  • Compiled a comprehensive list of links to papers and talks for further reading
  • While papers are free, talks will become free for the public soon (You can always “buy” a registration and view them)
  • By attending one of the three ICLR/ICML/NeurIPS you get a good view of AI in the current year

2

2

3 of 208

In a nutshell (1)

  • LLMs are everywhere but not for everyone yet
  • Democratization of LLMs
    • Free models (LLAMA 2)
      • The weights are public but not the training code
    • Open Source Models (LLM360)
      • Training data, code, checkpoints are public
      • Fully reproducible
    • Open Infrastructure (CollosalAI)
      • All the distribution infrastructure you need to train a huge model
    • Full stack language (Mojo)
      • Unifying the software layers for better LLM coding
    • Composable LLMS
      • Modularizing LLMs by combining smaller ones

3

3

4 of 208

In a nutshell (2)

  • LLMs for Math Reasoning
    • The return of theorem provers and their applications on other domains
    • Theorem Provers might be a great companion for LLMs
  • Are we running out of data for LLMs?
  • Relational Tables and Language Models
    • Predictive operations with LLMs
  • Hopfield Networks are back
    • Creating hope for better theoretical understanding of Attention
  • There was no tutorial/workshop about GNNs, only 42 papers/4000 in the main conference. Are transformers winning over GNNs?

4

4

5 of 208

AI futurism

5

5

6 of 208

Other NeurIPS reviews

6

6

7 of 208

Understanding LLMs

7

7

8 of 208

Optimizing attention

8

8

9 of 208

Test of time (Word2vec)

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov* · Ilya Sutskever* · Kai Chen · Greg Corrado · Jeff Dean

9

*Absent in the ceremony

9

10 of 208

What we have learned so far

  • The birth of self-supervision
  • Word2vec -> Elmo -> Transformer (BERT)
  • Context free embeddings -> Context Aware Embeddings
  • The return of Asynchronous Training

10

10

11 of 208

An ontology of word embeddings (Before GPT)

11

Something is missing

11

12 of 208

The revolution of context

12

12

13 of 208

The Language Model is the Embedding

  • In word2vec the word-vector is the embedding
  • Semantic Reasoning with Vector arithmetic

10 years later

“Efficiently Tuned Parameters are Task Embedding,

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models,

TASK2VEC: Task Embedding for Meta-Learning

13

Jump here if you want more!

13

14 of 208

The unreasonable behavior of LLMs

What is going on as we scale from thousand to million to billion to trillion parameters

14

14

15 of 208

Do LLMs have emergent properties?

15

15

16 of 208

Are we running

out of oil?

Sustainability analysis of data

16

16

17 of 208

Managing data repetitions and parameter budget

17

17

18 of 208

Training data trend

18

18

19 of 208

Training LLMs on data budget

19

19

20 of 208

LLMs as a World Model

20

20

21 of 208

The rise of simulators

21

21

22 of 208

Train LLMs with Simulation data!

22

22

23 of 208

LLMs not good enough for Common Sense

23

23

24 of 208

Literature

24

24

25 of 208

LLM not good enough for Social Tasks

25

25

26 of 208

Literature

26

26

27 of 208

Literature

27

27

28 of 208

A deep dive into the physics simulators

Recent Advances

28

28

29 of 208

MuJoCo: A physics engine for model-based control

Multi-Joint dynamics with Contact

  • one order of magnitude is due to faster computation,
  • one order of magnitude is due to parallel processing which fully utilizes all available processors
  • one order of magnitude is due to higher accuracy and stability allowing larger time-steps

29

29

30 of 208

iGibson 2.0

Object-Centric Simulation for Robot Learning of Everyday Household Tasks

  • supports object states,
    • temperature, wetness level, cleanliness level, and toggled and sliced states,
  • implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked.
    • given a logic state, iGibson 2.0 can sample valid physical states that satisfy it.
    • can generate potentially infinite instances of tasks with minimal effort
  • includes a virtual reality (VR) interface to immerse humans in its scenes to collect demonstrations.

30

30

31 of 208

iGibson 2.0

31

31

32 of 208

Habitat 2.0

Training Home Assistants to Rearrange their Habitat

  • ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments with articulated objects (e.g. cabinets and drawers that can open/close)
  • H2.0: a high-performance physics-enabled 3D simulator with speeds, representing 100× speed-ups over prior work
  • Home Assistant Benchmark (HAB): a suite of common tasks for assistive robots (tidy the house, prepare groceries, set the table)

32

32

33 of 208

AI2-THOR

An Interactive 3D Environment for Visual AI

  • The House Of inteRactions (THOR)
  • Near photo-realistic 3D indoor scenes
  • AI agents can navigate in the scenes and interact with objects
  • Enables research in many different domains
    • Deep reinforcement learning
    • imitation learning, learning by interaction
    • Planning,
    • visual question answering
    • Learning models of cognition.

33

33

34 of 208

AI2-THOR

34

34

35 of 208

A big library

35

35

36 of 208

Habitat 2.0

36

36

37 of 208

ThreeDWorld

A Platform for Interactive Multi-Modal Physical Simulation

  • high-fidelity sensory data
  • Physical interactions between mobile agents and objects in rich 3D environments.
  • real-time near-photorealistic image rendering; generative procedures
  • high-fidelity audio rendering;
  • realistic physical interactions for a variety of material types, including cloths, liquid, and deformable objects
  • customizable “agents” that embody AI agents;
  • support for human interactions with VR devices. TDW’s API enables multiple

37

37

38 of 208

ThreeDWorld

38

38

39 of 208

ScenseScript

39

39

40 of 208

Imitating Shortest Paths in Simulation

Effective Navigation and Manipulation in the Real World

40

40

41 of 208

LEARNING RIGID DYNAMICS WITH FACE INTERACTION GRAPH NETWORKS

  • Simulating rigid collisions among arbitrary shapes is difficult
    • complex geometry
    • strong nonlinearity of the interactions
  • (GNN)-based models learn to simulate complex physical dynamics
    • fluids
    • cloth
    • articulated bodies
  • “Face Interaction Graph Network” (FIGNet) extends beyond GNN-based methods,

41

41

42 of 208

Modeling faces not just nodes

42

42

43 of 208

Graph Networks as Learnable Physics Engines

43

43

44 of 208

Follow Kelsey Allen

44

44

45 of 208

Follow Alvaro Sanchez Gonzalez

45

45

46 of 208

Unisim: Probably the LLM equivalent for simulators

Learning Interactive Real-World Simulators

What is the difference with Sora?

  • Unisim is unified action-in-video-out generative framework
  • Combining diverse datasets rich in along different dimensions e.g., objects, scenes, actions, motions
  • Formulate the action-in-video-out framework as an observation prediction model conditioned on finite history and parametrized by a video diffusion model
  • Enable both high-level language policies, low-level control policies, and video captioning models to generalize to the real world when trained purely in simulation, thereby bridging the sim-to-real gap

46

46

47 of 208

Unisim

47

47

48 of 208

Other resources for Physics Simulation @NeurIPS 2023

48

48

49 of 208

Climate simulations with AI

49

49

50 of 208

A taxonomy of simulation research papers

50

Molecular/DNA/Proteins

Newtonian based

Rigid body

particles

Astronomical Scale

Human scale

Thermal physics

GNN based

Neural ODE/PDE

Generative based

Logic Rules

Videos Images

Quantum particles

Fluid Dynamics

Other simulations

50

51 of 208

Other NeurIPS 2023 resources relevant to physics

51

51

52 of 208

Other NeurIPS 2023 resources relevant to physics

52

52

53 of 208

19 Main track + 11 Benchmark papers (simulation)

  1. AVOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator
  2. Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis
  3. ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling
  4. SARAMIS: Simulation Assets for Robotic Assisted and Minimally Invasive Surgery
  5. Neural Lighting Simulation for Urban Scenes
  6. Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics
  7. DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
  8. Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability
  9. AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
  10. Expressivity-Preserving GNN Simulation
  11. MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Before 2023

53

53

54 of 208

Nick’s favorite Simulation framework

54

54

55 of 208

Back to Reasoning with simulators

55

55

56 of 208

Probabilistic Programs as the language of simulation

56

56

57 of 208

Detailed info

57

57

58 of 208

Language models as goal/reward

58

58

59 of 208

Guiding simulators with a structured language

59

59

60 of 208

Limitations

60

60

61 of 208

Multi Modal Theory of Mind

61

61

62 of 208

Language Models for social reasoning

62

62

63 of 208

Games as simulators

63

63

64 of 208

64

64

65 of 208

65

65

66 of 208

66

66

67 of 208

67

67

68 of 208

Social Simulator

68

68

69 of 208

Imagining and Verbalizing

69

69

70 of 208

70

70

71 of 208

An example

71

71

72 of 208

Takeaways

72

72

73 of 208

Distributed LLMs

Composing Big models from smaller ones

73

73

74 of 208

The LLM as a giant vector

74

74

75 of 208

The LLM as a giant vector

75

75

76 of 208

Would you ever write your product as a huge C++ file?

  • The Software 2.0 paradigm refers to small models (a single cpp file)
  • We need to be able to “compile” each model independently
  • Finally build a big model by combining them
  • We also need to be able to track changes

Let’s see what the software 3.0 might look like

76

76

77 of 208

Software 3.0

  • Data curation (1.0: Writing the code, vscode)
  • Small Language Model training (1.0: Compilation, gcc )
  • Libraries of Language Models (1.0: Library making gcc -fpic)
  • Linking Language Models for a specific task (1.0: Linking gnu ld)
  • Tracking changes (1.0: git)

77

77

78 of 208

Building a bigger Language Model

78

78

79 of 208

Distribution of Heterogeneous LLM

79

79

80 of 208

Collaborating on LLM training (Git)

80

80

81 of 208

A taxonomy of LLM merging

81

81

82 of 208

Incremental maintenance of LLMs

82

82

83 of 208

Building an LLM from scratch

83

83

84 of 208

Democratizing LLM building

84

84

85 of 208

All the necessary plumbing for building LLMs

Infrastructure is not easy

85

85

86 of 208

Hardware cannot keep up with model growth

86

86

87 of 208

Startups will be able to afford building GPT-3 in 2 years

87

87

88 of 208

Colossal-AI contribution

88

88

89 of 208

It takes a village to build one of them

89

burns

89

90 of 208

Colossal-AI offers a framework for building a LLM

90

90

91 of 208

Looks like there is a lot of demand!

91

91

92 of 208

Congratulations!

You build your first really Large LM

Can you tune, maintain, grow it without redoing everything from scratch?

92

92

93 of 208

Attribution as a dimension for optimizing LLM costs

93

93

94 of 208

Pillar I: Data attribution

94

94

95 of 208

Pillar II: Model Attribution

95

95

96 of 208

Pillar III: Algorithm Attribution

96

96

97 of 208

DataInf

97

97

98 of 208

Tiny LMs might be the solution

98

  • TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
  • Faster training/tuning cycles
  • Easier to do ablation studies on architecture and data contribution
  • You can tune an XGB model because it takes an hour to train it
  • If you can train a tiny LLM in hours then you can tune it too
  • Like XGB feature engineering the art is in crafting a good dataset

98

99 of 208

Mathematics/Reasoning and LLMs

99

99

100 of 208

The most exciting topic

  • How far are we from deepmath?
  • Machine Reading of the math web with LLMs
  • Can we follow the same paradigm for common and other reasoning?
  • Revisiting the Lean prover

100

100

101 of 208

Mathematical Reasoning

101

101

102 of 208

Deepmind’s recent buzz

102

102

103 of 208

What is formal theory proving?

103

103

104 of 208

Code vs. math symbols

104

104

105 of 208

LLMs are doing ok on high school level

105

105

106 of 208

LLema vs. Minerva

106

106

107 of 208

Informal vs. Formal Mathematical Reasoning

107

107

108 of 208

Checking Mathematical Proofs is Hard for Humans

108

108

109 of 208

Proof Assistants (Interactive Theorem Provers)

109

109

110 of 208

Examples of Proof Assistants

110

110

111 of 208

Generating Proof Steps (Tactics)

111

111

112 of 208

Searching for Proofs

112

112

113 of 208

Best First Search

113

113

114 of 208

Hyper Tree Proof Search

114

114

115 of 208

Is Proof Search Really Necessary?

115

115

116 of 208

Premise Selection

116

116

117 of 208

Magnushammer

117

117

118 of 208

Reprover: Retrieval-Augmented Prover

118

118

119 of 208

Premise Retrieval Improves Theorem Proving

119

119

120 of 208

LeanDojo

120

120

121 of 208

From Informal to Formal Proofs

121

121

122 of 208

The ecosystem

122

122

123 of 208

The process

123

123

124 of 208

An example

124

124

125 of 208

Theorem provers for code verification

With the help of LLMs

125

125

126 of 208

How can we verify code produced by LLMs?

126

126

127 of 208

Theorem Proving for Verified Code Generation

127

127

128 of 208

Formal Software Verification

128

128

129 of 208

Software verification on the wild

129

129

130 of 208

Current tools

130

130

131 of 208

Proof Synthesis SoTA

131

131

132 of 208

Proofster

132

132

133 of 208

The maintenance problem

133

133

134 of 208

How about an LLM?

134

134

135 of 208

The Clover Paradigm

135

135

136 of 208

Theorem Proving and LLMs: Takeaways

136

136

137 of 208

MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

137

137

138 of 208

138

138

139 of 208

Generating Reasoning data with LLMs for finetuning LLMs

139

139

140 of 208

Analogical prompting

140

140

141 of 208

A math example (1)

141

141

142 of 208

A math example (2)

142

142

143 of 208

Benchmarks

143

143

144 of 208

More Benchmarks

144

144

145 of 208

Discussion: abstraction is key in analogical reasoning

145

145

146 of 208

Takeaways

146

146

147 of 208

Literature

147

147

148 of 208

The trend continues

148

148

149 of 208

A programming Language for Transformers RASP-L

149

149

150 of 208

The original Language RASP

150

150

151 of 208

RASP-L extension

151

151

152 of 208

The paradox of Learning to reason from data

152

152

153 of 208

What can BERT learn?

153

153

154 of 208

BERT prefers Statistical to Logical Thinking

154

154

155 of 208

How should we interpret this phenomenon?

155

  • Are we using the wrong paradigm for training our models?
  • Is it impossible for logic to emerge from statistics?
  • What is the architecture that creates logic from statistical observations? (apart from the human brain)
  • Is that the definition of AGI?

155

156 of 208

Injecting Logic to GenAI models

156

156

157 of 208

More examples

157

157

158 of 208

Literature

158

158

159 of 208

Injecting Logic in the transformer

159

159

160 of 208

Symbol processing required for implication rule

160

160

161 of 208

Symbol processing required for implication rule

161

161

162 of 208

Symbol processing required for implication rule

162

162

163 of 208

Transformer Production Framework

163

163

164 of 208

Symbol processing required for implication rule

164

164

165 of 208

What we have learned

165

165

166 of 208

Biases for a new generation of deep-reasoning LMs

166

166

167 of 208

LLMs for Tabular Data

167

167

168 of 208

168

168

169 of 208

Tables are everywhere

169

169

170 of 208

Goals of the TRL workshop

170

170

171 of 208

Last year’s best paper

171

171

172 of 208

Very Interesting Results

172

172

173 of 208

TabPFN 1 year ago

173

173

174 of 208

TabPFN 2.0

174

174

175 of 208

GPT-4 as Data Science Assistant

175

175

176 of 208

An example

176

176

177 of 208

Benchmarks

177

177

178 of 208

Feature Engineering

178

178

179 of 208

Takeaways from CAAFE

179

179

180 of 208

Can LLMs learn how to do Gradient Descent?

180

180

181 of 208

In-Context Learning

181

181

182 of 208

A surprising Experiment

182

182

183 of 208

Literature

183

183

184 of 208

Prior-Fitted Networks (PFNs) Visualized

184

184

185 of 208

Quantitative result (87 numerical datasets without missing values)

185

185

186 of 208

Performance with many objectives

186

186

187 of 208

MotherNet

187

187

188 of 208

Generate a model instead of predictions

188

188

189 of 208

Architecture

189

189

190 of 208

Literature

190

190

191 of 208

Conclusions

191

191

192 of 208

Takeaways on TabPFN

192

192

193 of 208

The LLM version of Kumo

193

193

194 of 208

Another approach to the same problem

194

194

195 of 208

Text to SQL

What we have learned so far

195

195

196 of 208

Lessons learned from Natural Language to SQL

196

196

197 of 208

How Documentation improves GPT’s Text-to-SQL

197

197

198 of 208

Another Text-To-SQL paper

198

198

199 of 208

Benchmarks

199

199

200 of 208

New frontiers in Graph Learning

200

200

201 of 208

Foundational Models for graphs and relational data

201

201

202 of 208

The new Web

202

202

203 of 208

RAG to the rescue

203

203

204 of 208

Application Development with LLMs

204

204

205 of 208

Elementary and advanced

205

205

206 of 208

Wrapping up

206

206

207 of 208

Where do we go next?

  • We celebrated the 10 year birthday of LLMs
  • We celebrated the one year birthday of ChatGPT
  • Yet OpenAI was not present at NeurIPS (Why?)
  • Research community clearly sides with open source LLMs
  • The two main challenges
    • More good data
    • Cheaper Models
      • Cheaper hardware?
      • Less Parameters?
      • More Efficient algorithms?
  • Will AI push math to its limits?

207

207

208 of 208

208

208