Context and inspiration

We have had a lot of success with convolution-based regression for predicting the turning speed of the robot based on a single RGB input image. Unfortunately, we don’t expect that approach to generalize well to lines with sharp enough corners that they can leave the field of view of the camera. To solve that problem, we need some sort of recurrent approach, with LSTM (Long-Short Term Memory) networks being the most compelling option.

Ultimately, we chose to make two separate networks: a convolutional network that processed the images into predicted turn values and a 512-dimensional feature vector and a recurrent LSTM that used a time series of those outputs to refine the outputted cmd_vel.

Sample Outputs

An example from validation data

ID	Image	Human command	Conv. prediction	LSTM prediction
41		0.000	-0.001	-0.002
42		0.000	-0.001	-0.001
43		0.000	-0.003	-0.001
44		0.000	0.019	0.004
45		0.000	0.088	0.081
46		0.000	0.013	0.001
47		0.104	0.019	0.007
48		0.097	-0.081	-0.083
49		0.034	-0.013	-0.005
50		0.107	-0.041	-0.037
51		0.179	-0.027	-0.020
52		0.166	-0.001	-0.001
53		0.129	-0.001	-0.001
54		0.129	-0.001	-0.001
55		0.127	0.002	-0.001
56		0.127	0.006	0.000
57		0.039	0.058	0.044
58		0.117	0.089	0.084

An example from training data

ID	Image	Human command	Conv. prediction	LSTM prediction
340		-0.074	-0.071	-0.077
341		-0.124	-0.125	-0.132
342		-0.164	-0.166	-0.162
343		0.000	-0.001	-0.004
344		0.000	-0.001	-0.002
345		-0.049	-0.052	-0.057
346		-0.059	-0.058	-0.065
347		-0.206	-0.207	-0.207
348		-0.156	-0.151	-0.156
349		0.000	-0.001	-0.004
350		0.000	-0.001	-0.003
351		-0.122	-0.107	-0.113
352		-0.300	-0.298	-0.306
353		-0.300	-0.308	-0.292
354		-0.269	-0.288	-0.260
355		0.300	0.314	0.273

Results

Convolution-only

Data source	Training epochs	Training loss	Validation loss
Office	2500	0.02-0.03	0.02 - 0.03
QEA Blob + Office	425	0.0048	0.0039
QEA square	--	--	Very bad
...

Recurrent postprocessing

Data source	Data type	Training loss	Validation loss
Office only	Predicted cmd_vels	--	0.02 - 0.03
Office only	x512 feature vectors	--	0.02 - 0.03
QEA square	--	Very good	Very bad
...