We have had a lot of success with convolution-based regression for predicting the turning speed of the robot based on a single RGB input image. Unfortunately, we don’t expect that approach to generalize well to lines with sharp enough corners that they can leave the field of view of the camera. To solve that problem, we need some sort of recurrent approach, with LSTM (Long-Short Term Memory) networks being the most compelling option.
Ultimately, we chose to make two separate networks: a convolutional network that processed the images into predicted turn values and a 512-dimensional feature vector and a recurrent LSTM that used a time series of those outputs to refine the outputted cmd_vel.
ID | Image | Human command | Conv. prediction | LSTM prediction |
41 | 0.000 | -0.001 | -0.002 | |
42 | 0.000 | -0.001 | -0.001 | |
43 | 0.000 | -0.003 | -0.001 | |
44 | 0.000 | 0.019 | 0.004 | |
45 | 0.000 | 0.088 | 0.081 | |
46 | 0.000 | 0.013 | 0.001 | |
47 | 0.104 | 0.019 | 0.007 | |
48 | 0.097 | -0.081 | -0.083 | |
49 | 0.034 | -0.013 | -0.005 | |
50 | 0.107 | -0.041 | -0.037 | |
51 | 0.179 | -0.027 | -0.020 | |
52 | 0.166 | -0.001 | -0.001 | |
53 | 0.129 | -0.001 | -0.001 | |
54 | 0.129 | -0.001 | -0.001 | |
55 | 0.127 | 0.002 | -0.001 | |
56 | 0.127 | 0.006 | 0.000 | |
57 | 0.039 | 0.058 | 0.044 | |
58 | 0.117 | 0.089 | 0.084 |
ID | Image | Human command | Conv. prediction | LSTM prediction |
340 | -0.074 | -0.071 | -0.077 | |
341 | -0.124 | -0.125 | -0.132 | |
342 | -0.164 | -0.166 | -0.162 | |
343 | 0.000 | -0.001 | -0.004 | |
344 | 0.000 | -0.001 | -0.002 | |
345 | -0.049 | -0.052 | -0.057 | |
346 | -0.059 | -0.058 | -0.065 | |
347 | -0.206 | -0.207 | -0.207 | |
348 | -0.156 | -0.151 | -0.156 | |
349 | 0.000 | -0.001 | -0.004 | |
350 | 0.000 | -0.001 | -0.003 | |
351 | -0.122 | -0.107 | -0.113 | |
352 | -0.300 | -0.298 | -0.306 | |
353 | -0.300 | -0.308 | -0.292 | |
354 | -0.269 | -0.288 | -0.260 | |
355 | 0.300 | 0.314 | 0.273 |
Data source | Training epochs | Training loss | Validation loss |
Office | 2500 | 0.02-0.03 | 0.02 - 0.03 |
QEA Blob + Office | 425 | 0.0048 | 0.0039 |
QEA square | -- | -- | Very bad |
... |
Data source | Data type | Training loss | Validation loss |
Office only | Predicted cmd_vels | -- | 0.02 - 0.03 |
Office only | x512 feature vectors | -- | 0.02 - 0.03 |
QEA square | -- | Very good | Very bad |
... |