A | B | C | D | E | F | G | H | I | |
---|---|---|---|---|---|---|---|---|---|
1 | Timestamp | What will be the next big breakthrough in decision making / robotic learning / reinforcement learning in the next 5 years? (think e.g. AlexNet, ResNets, Transformers, Diffusion) | What is something we do right now in decision making / robotic learning / reinforcement learning that we WILL NOT be doing 5 years ? (think e.g. handcrafted features,) | ||||||
2 | 11/11/2022 19:16:15 | RL algorithms for autonomously influencing humans using social media, via auto generated content | sim to real | 14 | 4 | ||||
3 | 11/11/2022 19:16:38 | large scale offline pretraining | sim to real | 9 | |||||
4 | 11/11/2022 19:17:17 | The benchmark everyone will be working on will be on assistive human computer interfaces | Starting from scratch | ||||||
5 | 11/11/2022 19:17:31 | trying to learn on a real robot from scratch in the real world | 10 | ||||||
6 | 11/11/2022 19:17:35 | Simple methods massively scaled up | Trying to do reset-free RL | 4 | |||||
7 | 11/11/2022 19:17:59 | Learning exploration as an accumulating "experience" | model-based rl (at least on high-level decision making) | 4 | |||||
8 | 11/11/2022 19:18:01 | Dialogue systems | Working with fixed rewards | ||||||
9 | 11/11/2022 19:18:04 | Applications in hard-to-model domains that are not embodied | Mujoco | 6 | 8 | ||||
10 | 11/11/2022 19:18:07 | Robots learning with self play: think two physical Robots trying to be cooperative/adversarial (their choice) in maximizing their skill acquisition speed. We’ll see cooperatives, deception, murder, and everything in between! | We will really not be training from scratch or designing reward functions. Everything from natural language: describe desired behavior and condition on it! | 2 | 5 | ||||
11 | 11/11/2022 19:18:17 | Hierarchical decision making | State entropy based exploration | 2 | 5 | ||||
12 | 11/11/2022 19:18:24 | Training large models without backpropagation | Prompt Engineering | 3 | 8 | ||||
13 | 11/11/2022 19:18:47 | Internet scale exploration policies to aid foundation models data acquisition | RL + traditional robotics (we will codevelop hardware specifically for RL) | 2 | |||||
14 | 11/11/2022 19:18:53 | probabilistic programming | training in simulation | 1 | |||||
15 | 11/11/2022 19:18:58 | Use big generative models pretrained on text or images from non-interactive internet sources | 3 | ||||||
16 | 11/11/2022 19:20:05 | model-free RL | |||||||
17 | 11/11/2022 19:20:08 | some way to make RL more like supervised learning | thinking carefully about the composition or deists of robotics datasets (i.e we will have methods that can learn from very messy/heterogenous data (different controllers, cameras, robots)) | 4 | 2 | ||||
18 | 11/11/2022 19:20:20 | A structural separation of decision making and control | Testing on mujoco | ||||||
19 | 11/11/2022 19:20:29 | Standardized human-in-the-loop paradigms | Using rewards as the source of supervision | 5 | 2 | ||||
20 | 11/11/2022 19:21:19 | Effective transfer from human video | Sim2real | 3 | |||||
21 | 11/11/2022 19:21:32 | Gym environments | |||||||
22 | 11/11/2022 19:21:51 | Pre-trained visual representations for RL | |||||||
23 | 11/11/2022 19:21:56 | Distinguish the learning algorithm between online/offline learning | |||||||
24 | 11/11/2022 19:23:41 | RL for design | Submitting to conferences | 6 | 5 | ||||
25 | 11/11/2022 19:24:36 | Agents learning to play like babies | |||||||
26 | 11/11/2022 19:26:18 | AI teacher | |||||||
27 | |||||||||
28 | |||||||||
29 | |||||||||
30 | |||||||||
31 | |||||||||
32 | RL algorithms for autonomously influencing humans using social media, via auto generated content | ||||||||
33 | large scale offline pretraining | ||||||||
34 | The benchmark everyone will be working on will be on assistive human computer interfaces | ||||||||
35 | Simple methods massively scaled up | ||||||||
36 | Learning exploration as an accumulating "experience" | ||||||||
37 | probabilistic programming | ||||||||
38 | Applications in hard-to-model domains that are not embodied | ||||||||
39 | |||||||||
40 | |||||||||
41 | |||||||||
42 | |||||||||
43 | |||||||||
44 | |||||||||
45 | |||||||||
46 | |||||||||
47 | |||||||||
48 | |||||||||
49 | |||||||||
50 | |||||||||
51 | |||||||||
52 | |||||||||
53 | |||||||||
54 | |||||||||
55 | |||||||||
56 | |||||||||
57 | |||||||||
58 | |||||||||
59 | |||||||||
60 | |||||||||
61 | |||||||||
62 | |||||||||
63 | |||||||||
64 | |||||||||
65 | |||||||||
66 | |||||||||
67 | |||||||||
68 | |||||||||
69 | |||||||||
70 | |||||||||
71 | |||||||||
72 | |||||||||
73 | |||||||||
74 | |||||||||
75 | |||||||||
76 | |||||||||
77 | |||||||||
78 | |||||||||
79 | |||||||||
80 | |||||||||
81 | |||||||||
82 | |||||||||
83 | |||||||||
84 | |||||||||
85 | |||||||||
86 | |||||||||
87 | |||||||||
88 | |||||||||
89 | |||||||||
90 | |||||||||
91 | |||||||||
92 | |||||||||
93 | |||||||||
94 | |||||||||
95 | |||||||||
96 | |||||||||
97 | |||||||||
98 | |||||||||
99 | |||||||||
100 |