UI element detection from wireframe drawings of websites
Prasang Gupta and Vishakha Bansal
Pricewaterhouse Coopers US Advisory
Mumbai, India
PwC
Team Members
2
Prasang Gupta
Associate in US Emerging Technologies
prasang.gupta@pwc.com
Vishakha Bansal
Senior Associate in US Emerging Technologies
vishakha.bansal@pwc.com
PwC
PwC
Agenda�
3
Our
Solutions and Methodology
Performance and Future Scope
Problem Statement
Agenda�
PwC
ImageCLEF 2021 involved detecting a set of atomic user interface elements (UI) in hand-drawn images of websites.
4
A sample output file with the bounding boxes for different classes and the confidence scores.
PwC
PwC
Dataset contained commonly occurring elements like paragraph, button etc. in larger proportions compared to the rare elements like table, list etc….
5
The dataset contained 3,218 images labelled images (21 classes) in the development set and 1,073 unlabelled images in the test set. Development set was split into 95% train and 5% validation set.
Commonly occurring classes were abundant in the data while the rare ones were comparatively infrequent.
Paragraph, button, header, container etc. are most commonly occurring classes whereas table, stepperinput, list, video etc. are the rare classes in the given set of images
PwC
… and dataset also contained images of various sizes with different contrasts
6
The dataset contained images of various sizes and were resized later to 512 x 512 for training and inference purposes. As the images are the snapshots of the drawings on paper or whiteboard, it was found to have different contrasts.
Distribution of width and height of images in development set
Images in the development set with different contrasts
PwC
Applied data pre-processing for contrast improvement...
7
Contrast Limited Adaptive Histogram Equalisation (CLAHE) technique is used which limits contrast amplification before applying histogram equalization to change the range of the pixels present in the image from a very confined space to a much larger distribution resulting in clearer images
Original Image
Contrast Limited Adaptive Histogram Equalisation
Output image
PwC
… and converted the images to B&W for optimizing training
8
To confirm uniformity in images of the training dataset, we converted all of them into Black and White. The algorithms used for the conversion are :
Grayscale image
Final Noise-reduced image
Output after Adaptive Gaussian algorithm
Color to Grayscale :
Grayscale to Black & White :
Removing Noise :
PwC
We selected YOLOv5 class of models ...
Model | mAP val 0.5 | Speed on V100 (ms) | Parameters (million) |
YOLOv5s | 61.9 | 4.3 | 12.7 |
YOLOv5m | 68.7 | 8.4 | 35.9 |
YOLOv5l | 71.1 | 12.3 | 77.2 |
YOLOv5x | 72.0 | 22.4 | 141.8 |
Comparison of different YOLOv5 variants trained on COCO dataset
For modelling, we started exploring Mask-RCNN, U-Net and YOLOv5.
We discarded U-Net as it has a very large number of parameters and thus would be too costly to train. We also discarded Mask-RCNN as we have already explored the model and it was found to be running into issues in detecting smaller UI elements.
Hence, for this iteration, we selected the YOLOv5 class of models. We established a baseline by using YOLOv5s (small variant, lightest) and got decent mAP and Recall scores.
mAP value
0.649
Overall Recall
0.675
Baseline Score :
Results for baseline YOLOv5 (small variant, lightest) model
PwC
… and tried them all to find the best performing configuration
Model | mAP | Recall | mAP improvement (over baseline in %) |
YOLOv5s (Baseline) | 0.649 | 0.675 | - |
YOLOv5l (with pre-trained weights) | 0.810 | 0.826 | 24.8 % |
YOLOv5x (with pre-trained weights, LR, early stopping) | 0.820 | 0.840 | 26.3 % |
YOLOv5x (with pre-trained weights, frozen layers - head trained) | 0.701 | 0.731 | 8 % |
We started with YOLOv5s (small variant, lightest) and worked our way up to YOLOv5x (xlarge variant, heaviest) to boost performance.
We also tried both training the full layers as well as training just the heads.
We found that training just the head is much faster, but it results in significantly reduced performance.
Also, YOLOv5x model was unsurprisingly giving the best results from our tests.
PwC
Performance
Future Scope
11
Experimenting with the ensemble of modelling approaches for two types of wireframes: one with more compactly placed UI elements and another with less compactly placed UI elements
Exploration of other object detection models that favour performance over speed of inference
Visual inspection for explainability of the model performance on test dataset as it was performing close to perfect (nearly 95% Precision and Recall) on validation set
| Description | mAP | OR | F1 |
Run 1 | YOLOv5s (baseline) | 0.649 | 0.675 | 0.855 |
Run 2 | YOLOv5s (pre-trained weights) | 0.649 | 0.675 | 0.881 |
Run 3 | YOLOv5l (pre-trained weights) | 0.810 | 0.826 | 0.954 |
Run 4 | YOLOv5x (pre-trained weights, LR) | 0.820 | 0.840 | 0.952 |
Run 5 | YOLOv5x (pre-trained weights, frozen) | 0.701 | 0.731 | 0.894 |
Run 6 | Run 4 with 0.2 confidence cutoff | 0.824 | 0.844 | - |
Run 7 | Run 4 with 0.15 confidence cutoff | 0.824 | 0.844 | - |
Run 8 | Run 4 with 0.1 confidence cutoff | 0.829 | 0.852 | - |
Run 9 | Run 4 with 0.05 confidence cutoff | 0.832 | 0.858 | - |
Run 10 | Run 4 with 0.01 confidence cutoff | 0.836 | 0.865 | - |
on Validation dataset
PwC
Thank You
© 2021 PwC. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with professional advisors.
pwc.com
PwC
Questions ?
13
PwC
Appendix
14
PwC
Multi-Pass Inference Technique
15
Output generated after passing the input image through the model once. (1st pass)
Output generated after 2nd pass through the model. Smaller UI elements get detected.
Output generated after appending the results from both passes.
STEP 1
STEP 2
STEP 3
Success !
PwC
We improved the score further by using post-processing
Confidence cutoff (%) | Total dataset labels | % increase in labels | Test mAP scores |
25 | 101251 | 0.0 | 0.820 |
20 | 109848 | 8.5 | 0.824 |
15 | 119881 | 9.1 | |
10 | 130765 | 9.0 | 0.829 |
5 | 142963 | 9.3 | 0.832 |
1 | 165013 | 15.4 | 0.836 |
We employed 2 post-processing techniques :
Multi-Pass Inference technique is used for recognising missing elements. However, since the recall is pretty good, this does not increase the score much.
Confidence cutoff variation helped increase the score as the model was encouraged to predict labels for classes even after it had lower confidence in that label.
PwC