History of OCR
Agenda
OCR - Optical Character Recognition
OCR History - MNIST Origins
OCR History - Hand engineered forms
OCR History - Early Segmentation
OCR History - Tesseract
OCR History - ABBYY
Research - Neural Networks
Research - Neural Networks
Research - Early Data
Research - Early Data
Research - Early Data
Modern Pipeline
Research - Improved Detectors
Research - Improved Recognizers
CTC Loss
Research - Transformers
Nemotron OCR
Nemotron Pipeline Diagram
Detection
Input Image (H x W x 3)
Feature Pyramid Network
Feature maps
(H/4 x W/4 x feature size)
Pixel Classification
Feature maps
(H/4 x W/4 x feature size)
Pixel class
Box regression
Feature maps
(H/4 x W/4 x feature size)
Pixelwise box regression
For every pixel predicted inside of a text box regress top, left, bottom, right and rotation
Angle Prediction
Feature maps
(H/4 x W/4 x feature size)
Pixelwise box regression
For every pixel predicted inside of a text box regress top, left, bottom, right and rotation
Recognition Transformer
Global Relational Model
Feature maps
(H/4 x W/4 x feature size)
Recognition Features
Geometry Features
Relational Model
Pairwise k=16 nearest neighbor
�Next word in line
Line after
Line confidence
Global Relational Model
Vision Language Models