Vision-based Page Rank Estimation with
Graph Networks
Timo I. Denk, Samed Güner (TINF16 B1)
May 20, 2019
DHBW Karlsruhe
Problem Statement
Does the appearance of a website correlate with its popularity?
Appearance-Rank Correlation
good look
poor look
high rank
low rank
Dataset
We crawled the world's top 100k websites and generated a dataset of graphs
desktop screenshot
mobile screenshot
hyperlink
540 ms
1337 KB
HTTPS errors ...
560 ms
1326 KB
HTTPS
errors ...
489 ms
1557 KB
HTTPS
errors ...
Datacrawler
Model Architecture
desktop screenshot
mobile screenshot
hyperlink
Screenshot Feature Extractor
feature vectors
Graph Network
estimated rank
screenshot graph
feature vector graph
Training Objective
Ranking problem
Pairwise ranking loss, Burges et. al (2005)
Given two samples, guess which one is higher ranked
→ Parameter update with stochastic gradient descent
Visualization of the intermediate representations of the feature extractor ("activation maps")
column layout
natural images
Human Score
Baseline for performance comparison
#68,636 (top)
#15,898 (bottom)
screenshots of the first page
screenshots of the second page
Results
Mode | Accuracy | Description |
Random guessing | 50.0% | Random guessing in pairwise prediction |
Human score | 57.8% | Experiments with nine test people |
Feature extractor | 60.7% | CNN only |
Graph network | 62.7% | CNN feature extractor + graph network |
Contributions
Publication of our screenshot dataset
Page ranks can be inferred from web page screenshots, correlation exists
ML model for vision-based page rank estimation based on a combination of convolutional and graph network