Measurement Lab
Georgia Bullen georgia@measurementlab.net�Peter Boothe pboothe@measurementlab.net
Introduction�Link to slides: https://bit.ly/mlab-ais2019
What is M-Lab?
M-Lab’s Mission
�Measure the internet. �Save the data.�Make it universally accessible and useful.
Note: we don’t measure the internet by ourselves -- people measure the Internet, using their own computers/phones and our servers, and we collect the data, and support them in their measurements
M-Lab Principles
2018 numbers projected: current total / (7/12)
High capacity servers placed next to content
M-lab measures user experience of the full route from user to content
Today — 500+ Servers in 130+ locations
In Africa
Where do tests come from?
CIRA’s IPT, Google Search, Software Integrations (uTorrent), Router Integrations, Fingbox,
Chrome Extension, the M-Lab Website
Where do tests come from?
Running an M-Lab Speed Test
On demand:
Scheduled:
Where does M-Lab have data from?
Let’s check the quick visualization site
How can the M-Lab datasets be useful to you?
Tutorial
Goal
By the end of this session, each of you will have constructed a unique query to gather and display M-Lab data of interest to you.
For example:
https://datastudio.google.com/s/nqN5k9ktVns
which visualizes the results of
https://console.cloud.google.com/bigquery?sq=754187384106:a67f3ad29f474169b1902b55de2b4e0d
Agenda
M-Lab Datasets
Glasnost
Max Planck Institute for Software Systems
MobiPerf
University of Michigan
Network Diagnostic Tool
Internet2
Neubot
Nexa Center for Internet and Society, Politecnico di Torino
NPAD
Pittsburgh Supercomputing Center
Reverse Traceroute
University of Washington
Paris Traceroute
University Pierre et Marie Curie
Project Bismark
Princeton University
Sharperprobe
Georgia Tech College of Computing
Windrider
Northwestern University
Experiments
Datasets in BigQuery
Datasets
Network Diagnostic Tool
Internet2
Paris Traceroute
University Pierre et Marie Curie
New schema coming soon!
Access data schemas in Bigquery.
M-Lab Data in BigQuery - Free!
NDT
Network Diagnostic Tool - NDT
Fun facts and links
NDT
NDT BigQuery Schema
test_id & log_time & parse_time metadata for every row��connection_spec.* client metadata�connection_spec.client_geolocation.* lat/lon, country, region, etc�connection_spec.data_direction 1 / download - 0 / upload��connection_spec.client.network.asn Client ASN �connection_spec.server.network.asn M-Lab server ASN��web100_log_entry.connection_spec.* server and client IP & ports
web100_log_entry.snap.* Web100 metrics�web100_log_entry.snap.HCThruOctetsAcked download byte count�web100_log_entry.snap.HCThruOctetsReceived upload byte count�web100_log_entry.snap.SndLimTimeRwin Receiver Limited Time�web100_log_entry.snap.SndLimTimeCwnd Congestion Limited Time�web100_log_entry.snap.SndLimTimeSnd Sender Limited Time�web100_log_entry.snap.CongSignals Total congestion events
NDT - Common Metrics https://www.measurementlab.net/data/docs/bq/ndtmetrics/
�8 * (web100_log_entry.snap.HCThruOctetsAcked / (web100_log_entry.snap.SndLimTimeRwin +� web100_log_entry.snap.SndLimTimeCwnd +�web100_log_entry.snap.SndLimTimeSnd))�
�8 * (web100_log_entry.snap.HCThruOctetsReceived / web100_log_entry.snap.Duration)
Geolocation Annotations
Querying in BigQuery
Back to the example we started with...
Now let’s add time...
BigQuery Aggregate, Approximate Aggregate, and Statistical Functions
Mapping & GIS
BigQuery (GIS) - Sample Query
#standardSQL
SELECT
count(test_id) as count_tests,
count(distinct connection_spec.client_ip) as count_ips,
APPROX_QUANTILES(8 * SAFE_DIVIDE(web100_log_entry.snap.HCThruOctetsAcked,
(web100_log_entry.snap.SndLimTimeRwin +
web100_log_entry.snap.SndLimTimeCwnd +
web100_log_entry.snap.SndLimTimeSnd)), 101)[SAFE_ORDINAL(51)] AS download_Mbps,
APPROX_QUANTILES(web100_log_entry.snap.MinRTT, 101)[SAFE_ORDINAL(51)] AS min_rtt,
state_name as name,
ANY_VALUE(state_geom) AS WKT
FROM
`measurement-lab.ndt.downloads`,
`bigquery-public-data.geo_us_boundaries.us_states`
WHERE
connection_spec.server_geolocation.country_name = "United States"
AND partition_date BETWEEN '2019-01-01' AND '2019-05-30'
AND ST_WITHIN(ST_GeogPoint(connection_spec.client_geolocation.longitude , connection_spec.client_geolocation.latitude ), state_geom)
GROUP BY name
BigQuery (GIS) - Sample Query
SELECT
paris_traceroute_hop.src_geolocation.country_name as src,
paris_traceroute_hop.dest_geolocation.country_name as dest,
COUNT(*) as hops,
TIMESTAMP_TRUNC(log_time, DAY) as day,
APPROX_QUANTILES(paris_traceroute_hop.rtt[OFFSET(1)],101)[SAFE_ORDINAL(51)] as rtt,
MAX(paris_traceroute_hop.rtt[OFFSET(1)]) as max_rtt,
MIN(paris_traceroute_hop.rtt[OFFSET(1)]) as min_rtt,
ST_MAKELINE(ST_GEOGPOINT(ANY_VALUE(paris_traceroute_hop.src_geolocation.longitude),
ANY_VALUE(paris_traceroute_hop.src_geolocation.latitude)),
ST_GEOGPOINT(ANY_VALUE(paris_traceroute_hop.dest_geolocation.longitude),
ANY_VALUE(paris_traceroute_hop.dest_geolocation.latitude))) as WKT
FROM
`measurement-lab.node.traceroute`
WHERE
TIMESTAMP_TRUNC(log_time, DAY) > TIMESTAMP("2019-01-01")
AND (paris_traceroute_hop.src_geolocation.continent_code = "AF"
OR paris_traceroute_hop.dest_geolocation.continent_code = "AF")
GROUP BY src, dest, day
HAVING
hops > 50
AND src != ""
AND dest != ""
ORDER BY hops desc
RIPE Atlas
Using M-Lab Servers with RIPE Atlas
Resources
Thank you!
Georgia Bullen georgia@measurementlab.net�Peter Boothe pboothe@measurementlab.net