1 of 31

Keen Technologies

Research Directions

2 of 31

Quick Background: Id Software

3 of 31

Quick Background: Armadillo Aerospace

4 of 31

Quick Background: Oculus

5 of 31

Quick background: Keen Technologies

  • OpenAI
  • Doing the reading
  • Research, not product
    • There are key, fundamental discoveries to be made
  • Richard Sutton and Alberta
  • Six researchers today

  • Richard Sutton
  • Joseph Modayil
  • Khurram Javed
  • John Carmack
  • Gloria Kennickell
  • Lucas Nestler

6 of 31

Where I thought I was going

  • Not LLMs
    • LLMs can know everything without learning anything
    • Learn from experience, not an IID blender
  • Virtual environments and games
    • Bot history
  • Video understanding
  • Infinite video wall

7 of 31

Missteps

  • Too low level
  • Avoided larger experiments too long
  • Sega Master System
  • Video can wait

8 of 31

Settling in with Atari

  • Deep research history
    • Mostly replicable, but assumptions varied
    • RL framework rot over the years
  • Unbiased and diverse
  • Isn’t Atari solved?
    • MEME, Muesli, BBF, etc
    • Scores are there for most games, given enough time
    • Lots of critical questions are still open!
  • Why not Pokemon or Minecraft?
    • Scrolling and 3D will definitely add more challenges
    • Tempts people to look at internal data structures

9 of 31

Reality is not a turn based game

  • The real world keeps going
    • regardless of if you are ready
  • Invert the typical RL research environment
    • the agent gets called, not the environment
  • Single stream of experience
    • Parallel environments are a crutch
  • Speed
  • Latency
    • Add a little action queue!
  • How else might reality differ?

10 of 31

Physical Atari!

  • Kind of a stunt, but novel AFAIK
  • Could a robot play a video game?
  • Contact with reality
  • Felt like a product push
  • We will Open Source

11 of 31

Physical Atari: Games

12 of 31

Physical Atari: Compute

  • ASUS ROG Strix SCAR 16
  • 16GB 4090 Laptop GPU
  • Realtime training at frame_skip 4

13 of 31

Physical Atari: Camera

  • Resolution
    • Atari only 160x210
  • Frame rate
  • Uncompressed images
  • USB interface
    • GPUDirect for Video
  • Scanout concerns

14 of 31

Physical Atari: Rectification

  • Learns OK with a fixed camera
    • No ability to transfer to another setup
  • Manual corner picking
  • April Tags
    • Can work with a moving camera
    • Lighting challenges
  • General purpose screen detection
    • Future research

15 of 31

Physical Atari: Virtual Joystick

  • Digital IO board wired to joystick connector

16 of 31

Physical Atari: Robotroller

  • Three servos
  • Additional latency
  • Spurious action issue
    • Atlantis side firing
    • Actions in observations

17 of 31

Physical Atari: Robustness

  • Servos and joysticks wear out!
    • Reduce max current

18 of 31

Physical Atari: control latencies

  • Total latency is substantial
  • Not far off from human
    • Try a reaction tester

19 of 31

Physical Atari: Score and life detection

  • Surprisingly, the trickiest part!
    • Hasn’t this been solved since MNIST?
    • Ask an LLM to tell you the score…
    • Game dependent
    • Heuristics

20 of 31

Physical Atari: Custom Dev Box

  • Raspberry Pi running ALE
  • Atari joystick port
  • Emissive tags for rectification and score

21 of 31

Physical Atari: Lessons

  • ConvNets tolerate a wide range of image distortions
    • Not yet clear how well the models transfer across lighting conditions
  • TD learning tolerates system latency
  • Top simulation agents perform poorly with added latency
    • SPR and world models that condition on action taken
    • Need a robust fix for this, not just a matching delay
  • Action-to-action paths can be a problem
  • Decent learning in a few hours of experience
  • Achieving human level performance looks feasible
    • Especially if talking to a remote big GPU

22 of 31

Sequential Multi-Task Learning

  • Sequential is much harder than parallel
  • No TaskID or weight switching
  • No hyperparameter ramp alignment
    • Exploration, optimizer, model reset
  • The global weight impact of online learning
  • Is a giant replay buffer enough?
    • Using TB of storage isn’t crazy
  • Offline RL can bootstrap into a coherent fantasy untested by reality

23 of 31

Transfer Learning

  • Don’t play like an idiot at the start
    • After years of subjective time playing game, should do better
  • OpenAI’s Gotta Learn Fast
    • 1M steps is plenty to learn without any transfer learning
  • GATO: More effective to learn a game from scratch than to fine tune
    • Negative transfer learning
  • Distinct from continuous forgetting

24 of 31

New Benchmark

  • Eight games?
  • Three cycles through all the games?
  • 400k frames per cycle?
    • Handle truncation, or only switch at end of episode?
  • Close to the compute for 26 games of Atari100k
  • No dedicated evaluation phase, sum over last cycle
  • Full action set
  • Sticky actions
  • Add a control latency?
  • Real time?
  • BBF from scratch as baseline performance
  • Standard benchmark harness

25 of 31

Sparse rewards

  • Individual scores isn’t the driver for humans like it is for RL agents
    • Often don’t even look at score until the end!
  • The “hard exploration games” – Pitfall, Montezuma’s Revenge, etc
    • Make any game into hard exploration by only giving reward at loss of life or game over
  • Dense rewards for human tasks are the exception, not the rule
  • Intrinsic rewards
  • Curiosity
  • Meta-curiosity across games
    • How would a human play a library of arcade games?

26 of 31

Exploration

  • Epsilon-greedy problems
  • Soft-Q with temperature
  • Action space factorization
    • Million+ actions on a modern controller
  • Confidence
  • Timescales
    • Frame_skip 4
    • Action gap
  • Options

27 of 31

Recurrence vs Frame Stacks

  • Frame stacks are unfortunately effective for Atari
  • Brains are recurrent neural networks

28 of 31

Function Approximation Dominates Performance

  • Just a black box in classic RL
  • Many duties, surprising it works as well as it does!
    • Learning new results for novel inputs
    • Generalizing across similar inputs
    • Averaging stochastic processes
    • Updating non-stationary processes
  • Supervised learning practices don’t always transfer
  • Adam is still very hard to beat
  • Auxiliary losses just tweak the value / policy approximation
  • Are neural nets and backprop even the right thing?

29 of 31

Value Representation

  • DQN clamping
  • Categorical values
  • Quadratic value compression
  • MSE was all we needed?
  • Brains and values

30 of 31

Plasticity vs Generalization

  • Generalization is ignoring details
  • Plasticity involves noticing new details
  • Every new online sample is a held-out validation sample
    • But how do we use it to improve?

31 of 31

Conv Nets

  • Poor transfer from ImageNet to RL
  • Kernel subsets
  • Parameter sharing increases performance
  • Factored 1D CNNs
  • Isotropic CNNs
  • Dilated Isotropic CNNs
  • Isotropic DenseNet CNNs
  • Recurrent Isotropic Semi-Dense CNNs