Review of HW&SW Deep-Learning projects in RTC-Seville
Alejandro Linares Barranco
Neuromorphic Processor Project (NPP)
2015-2018 phase1, 2018-2020 phase2
NPP Project Goals*
In NPP, ANNs and SNNs have been combined to implement convolutional and recurrent neural network theory and hardware that uses sparse and change-driven computing
* From kickoff meeting May 2015
Y Bengio, M Courbariaux
J Seo
R Manohar
T Delbruck SC Liu, �G Indiveri
F Corradi
ES Shim, SJ Suh, JH Lee
B Barranco, A Linares
NPP Team
MNIST on Stratix-V DE5 platform with Altera OpenCL SDK
Implementation and Performance study
Alejandro (USE) & Jae-sun Seo (ASU)
MNIST ConvNet architecture under evaluation. Similar to Le-Net proposed in 1998 by Lecun.
28x28 20@24x24 20@12x12 50@8x8 50@4x4 500 10
MNIST ConvNet architecture from Caffe http://caffe.berkeleyvision.org/
Caffe exports OpenCL for GP-GPUs. Easy to adapt to FPGAs.
Developed with AOCL in 2015
Today OpenVINO Toolkit works with Caffe & Tensorflow
Demo with MNIST and Altera DE5 platform
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Roshambo demonstration on Xilinx PSoC
Alejandro Linares-Barranco
Antonio Rios-Navarro
IEEE-TNN early access: https://doi.org/10.1109/TNNLS.2018.2852335
RoShamBo training images
RoShamBo CNN architecture
Conv 5x5
16x60x60
Total 18MOp (~9M MAC)
Paper
Scissors
Rock
Background
64x64
DVS 2D rectified histogram of
2k events
(0.1Hz – 200 Hz rate)
MaxPool 2x2
16x30x30
Conv 3x3
32x28x28
Conv 1x1
+
MaxPool 2x2
128x1x1
MaxPool 2x2
32x14x14
Conv 3x3
64x12x12
MaxPool 2x2
64x6x6
Conv 3x3
128x4x4
MaxPool 2x2
128x2x2
240x180 DVS “frames”
Conventional 4-layer LeNet with ReLU/MaxPool and 1 FC layer before output.
CNN Hw accelerator: NullHop main features
1. Compressed layers are stored using sparsity map
4. Zero pixel MACs are completely skipped
2. Pixels are loaded only once per 128 output feature maps
5. Kernels for layer are loaded to MAC SRAM banks
7. Output maps are computed in parallel (up to 128)
8. 2x2 max pooling “on the fly” cuts external DRAM writes by 4X
9. Compressed layer is written out ready for being streamed back
6. Controllers cluster MAC units for 16-128 output maps/pass
3. Pixel & channel ordering scales to arbitrary image size
FPGA infrastructure for NullHop testing and demonstration
Xilinx Zynq-7100 PSoC from a MMP AVnet module. Motherboard developed by RTC
Demo latencies of Roshambo CNN on NullHop. NIPS-2016
Deep-Learning applied to diagnostic assistance on Prostate Cancer
Tejido sano
(TS)
Gleason Grado 3
(G3)
Gleason Grado 4
(G4)
Gleason Grado 5
(G5)
WSI
Whole Slide Tissue Image
CNN-based methodology
DataSet collected from real tissue from Valme Hospital:
Dataset
WSI
I. Rotations
II. Brightness changes
III. Focus changes
TRAIN
X processed images
Labels file
TEST
Y processed images (from different sources of those of X)
Labels file
Always in process!
Artificial image processing for dataset increment
Resize
RoI extractions (TS, G3, G4 y G5)
CNN (LeNet, AlexNet, ResNet…)
INPUT
OUTPUT
Preliminary results
Architecture | Image size | # Images | pre-process? | Dataset artificial incr.? | Accuracy |
LeNet | 100x100 | 17.022 | No | No | ~68% |
VGG19 | 100x100 | 17.022 | No | No | ~74% |
Other on-going projects: