THE LORD OF THE POLICIES
DATA SCIENCE JOURNEY
ROADMAP
UNEXPECTED INSURANCE POLICIES
Can you imagine how many different types of documents are there?
UNDERSTANDING THE PROBLEM
THE DATA IS IN THE RIGHT PLACE IN JUST 5 SECONDS!
OUR MODEL EXTRACTS MEANINGFULL DATA
TAKE A PHOTO OF YOUR DOCUMENT
THE JOURNEY OF TWO BURGLARS
01
02
03
THE FELLOWSHIP OF THE POLICIES
Master
Marko Bogoevski
Master
Risto Trajanov
With his keen eyesight for error, sensitive debugging, and excellent codemanship, Master Marko was valuable to the Fellowship in their journey across the realm of the Policies. He was well-known for becoming friends with the code warrior Master Risto, despite their long wars between their kin in the past.
Master Risto was a well-respected code warrior in the realm of the Policies during the Debugging Years. He was a member of the Fellowship of the Policies and was the only one of the code warriors to fight alongside the bowman, Master Marko, in the war against the Generator at the end of the Third Month of the Internship. After the defeat of the Generator, he was given lordship of the Documents at Team’s Temple.
STARTING LINE - SHIRE
01
.PDF documents
Later converted to .JPG
JSON target files for every document
.JPG images
PREPROCESS
DATA READY FOR THE MODEL
01
02
03
04
PDF to JPG converter
OCR on the images to extract the text segments and their coordinates
Transform the coordinates, images and target json’s into input format for the model
OCR module
OUR SOLUTIONS – ONE MODEL TO RULE THEM ALL
VISUAL CHARACTERISTICS
TEXT SEGMENTS
GLCNN
Graph Learning Layer + Graph Convolution Layer(s) = Graph Learning Convolution Neural Network
The graph learning layer provides an optimal adaptive graph representation for graph convolutional layers.
Input
Output
Results so far…
Results so far
Postprocessing of output
Problems we encountered
Two potential solutions: generate own data and relabel segments
Extracts from the generator ( + json and OCR generated files)
Relabeling app
Results on fake (generated) data
THE DESOLATION OF TECHNOLOGIES
PyTorch
THE BATTLE OF IMPRESSIONS