Context and inspiration

We have had a lot of success with convolution-based regression for predicting the turning speed of the robot based on a single RGB input image. Unfortunately, we don’t expect that approach to generalize well to lines with sharp enough corners that they can leave the field of view of the camera. To solve that problem, we need some sort of recurrent approach, with LSTM (Long-Short Term Memory) networks being the most compelling option.

Ultimately, we chose to make two separate networks: a convolutional network that processed the images into predicted turn values and a 512-dimensional feature vector and a recurrent LSTM that used a time series of those outputs to refine the outputted cmd_vel.

Sample Outputs

An example from validation data

ID

Image

Human command

Conv. prediction

LSTM prediction

41

0.000

-0.001

-0.002

42

0.000

-0.001

-0.001

43

0.000

-0.003

-0.001

44

0.000

0.019

0.004

45

0.000

0.088

0.081

46

0.000

0.013

0.001

47

0.104

0.019

0.007

48

0.097

-0.081

-0.083

49

0.034

-0.013

-0.005

50

0.107

-0.041

-0.037

51

0.179

-0.027

-0.020

52

0.166

-0.001

-0.001

53

0.129

-0.001

-0.001

54

0.129

-0.001

-0.001

55

0.127

0.002

-0.001

56

0.127

0.006

0.000

57

0.039

0.058

0.044

58

0.117

0.089

0.084


An example from training data

ID

Image

Human command

Conv. prediction

LSTM prediction

340

-0.074

-0.071

-0.077

341

-0.124

-0.125

-0.132

342

-0.164

-0.166

-0.162

343

0.000

-0.001

-0.004

344

0.000

-0.001

-0.002

345

-0.049

-0.052

-0.057

346

-0.059

-0.058

-0.065

347

-0.206

-0.207

-0.207

348

-0.156

-0.151

-0.156

349

0.000

-0.001

-0.004

350

0.000

-0.001

-0.003

351

-0.122

-0.107

-0.113

352

-0.300

-0.298

-0.306

353

-0.300

-0.308

-0.292

354

-0.269

-0.288

-0.260

355

0.300

0.314

0.273


Results

Convolution-only

Data source

Training epochs

Training loss

Validation loss

Office

2500

0.02-0.03

0.02 - 0.03

QEA Blob + Office

425

0.0048

0.0039

QEA square

--

--

Very bad

...

Recurrent postprocessing

Data source

Data type

Training loss

Validation loss

Office only

Predicted cmd_vels

--

0.02 - 0.03

Office only

x512 feature vectors

--

0.02 - 0.03

QEA square

--

Very good

Very bad

...