Sports Data Analysis and Visualization
IEEE Vis 2022 Tutorial
Romain Vuillemot
Outline
02:00-02:15 Welcome, goals of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization I
~~~half-time~~~
03:45-04:00 Code demo: event-based visualization II
04:30-04:30 Code demo: tracking data visualization
04:30-05:00 Warp up and QA
Welcome 👋 My name is Romain Vuillemot I am Assistant Professor in Computer Science at Ecole Centrale de Lyon, France, with +10 years of experience working on sports data visualization. I teach general infovis class and run visualization workshops with coaches and sports federations.
2012-2015 (Opta events data) �Charles Perin, Romain Vuillemot, Jean Daniel Fekete. “SoccerStories: A Kick-off for Soccer Visual Analysis.” IEEE InfoVis 2013. ��
2016-2018 (SportsVU, Metrica data)�Gabin Rolland, Nathan Riviere, Romain Vuillemot, Wouter Bos “A Study of Space and Time-Dependence of 3-Point Shot Efficiency in Basketball” Accepted MIT Sloan Sports Analytics
��2019-2024 (Data using Computer Vision)
�
Goal of this tutorial 🎯 The general goal of this tutorial is to be able to get into sports data visualization and do simple things symply.
Simple, yet recent examples of sports visualization �Sports will be covered in general with a focus on teams sport with a spatial analysis approach (ie events with locations).
A bit of design space and challenges that are usually encountered when working with sports data. However we will not cover the full design space of sports visualization.
Understand all the technical steps such as collecting data, what i needed for the analysis but also the metadata needed to communicate the visualization efficiently.
Build your first sports visualization we will provide data and step-by-step process to do so with examples of design. It is a starting points to adapt to a different sport or design.
❌ Not covered: design process, sports-specific statistics, historical data/prediction visualizations.
For research in sports data visualizations
“Sport Vis session” Thursday, October 20, 2022 (10:45 AM-12:00 PM)
Charles Perin, Romain Vuillemot, Charles D Stolper, John T Stasko, Jo Wood, et al.. State of the Art of Sports Data Visualization. Computer Graphics Forum, Wiley, 2018, 37 (3), pp.1-24. https://hal.archives-ouvertes.fr/hal-01806107
IEEE VIS Anthology Vis by Hendrik Strobelt and Benjamin Hoover https://visav.vizhub.ai/
“Sports” papers
Outline
02:00-02:15 Welcome, goals of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization I
~~~half-time~~~
03:45-04:00 Code demo: event-based visualization II
04:30-04:30 Code demo: tracking data visualization
04:30-05:00 Warp up and QA
What is sports, visualization and analysis?
Sports is “an activity involving physical skills in which an individual or team competes against another or others for entertainment”
Sports data should reflect such activity along with:
Events
Actions that occur during games (eg pass, goal)
Tracking data
Continuous positions of players and balls
Metadata
Description of the context of the game
Charles Perin, Romain Vuillemot, Charles D Stolper, John T Stasko, Jo Wood, et al.. State of the Art of Sports Data Visualization. Computer Graphics Forum, Wiley, 2018, 37 (3), pp.1-24. https://hal.archives-ouvertes.fr/hal-01806107
Why get into sports visualization?
Work on an enjoyable topic�Sport usually is an enjoyable experience: you have a favorite player, team; you may practice the sport, have recorded personal analytics.
Start discussions in a more objective way�Clubs or coaches need better analytics on games, for tactical or recruitment purpose.
Address difficult visualization challenges�This is why there are so many research papers in VIS and other CS conferences. Many interesting contributions recently involving Computer Vision.
Visibility of the Vis work to non-experts�Most people are familiar with sport practice or watching experience. They may relate to the work more easily.
Career in the sports analytics industry�There is a growing number of openings in the industry (et startups) or sports analytics department in clubs.
Why is it (still) challenging?
Diversity of sports �There exists +200 sports, with their own rules, balls, accessories. They also are locally distributed and with different levels of practice.��Strong spatial domain �Space is needed to provide full context of the game.. Landmarks are also crucial for the visual representation. Also video recordings may provide full context (if available).��Sharing is difficult �Licences and copyright especially for videos). Sports is also a competitive activity so sharing data, tools and insights to opponents may not enable to increase a team’s results.
Data..�They are particularly messy and inconsistent. They are difficult to collect at high quality and in a uniform way (ie using a similar data model)
Examples of data domains
Charles Perin, Romain Vuillemot, Charles D Stolper, John T Stasko, Jo Wood, et al.. State of the Art of Sports Data Visualization. Computer Graphics Forum, Wiley, 2018, 37 (3), pp.1-24. https://hal.archives-ouvertes.fr/hal-01806107
Wu, Yingcai, et al. "ittvis: Interactive visualization of table tennis data." IEEE transactions on visualization and computer graphics 24.1 (2017): 709-718.
Examples of spatial domains
https://observablehq.com/@severo/soccer-pitch�https://mplsoccer.readthedocs.io/en/latest/index.html
⚠️ normalization is needed
Examples of spatial domains
Examples of sports visualizations: shot maps
Kirk Goldsberry, “CourtVision: New Visual and Spatial Analytics for the NBA” MIT Sloan Conference 2012
Examples of sports visualizations: pitch control maps
Javier Fernandez, Luke Bornn. Wide Open Spaces: A statistical technique for measuring space creation in professional soccer. MIT Sloan Conference 2018
Examples of sports visualizations: density maps
Federer’s frequency of shots passing through a given point on the court�https://gamesetmap.com/?m=201302
Examples of sports visualizations: tactical maps
Analysis and takeaways from the recent Liverpool vs Manchester city game.�https://rattibha.com/thread/1446509537989562374
Sports visualization design space Organized primarily from events to movement and then progressively to specific encodings. The focus will be on space-related visualizations
Positions�+Events
Events
Positions
(x, y, t)
(type_event)
(x, y, t, type_event)
(x_video, y_video)
Sports visualization design space Organized primarily from events to movement and then progressively to specific encodings. The focus will be on space-related visualizations
Positions�+Events
Events
Positions
(x, y, t)
(type_event)
(x, y, t, type_event)
(x_video, y_video)
Sports visualization design space Organized primarily from events to movement and then progressively to specific encodings. The focus will be on space-related visualizations
All
Binning
Derive
Connect
Animate
Sports visualization design space Organized primarily from events to movement and then progressively to specific encodings. The focus will be on space-related visualizations
All
Binning
Derive
Connect
Animate
Events
Tracking�+Events
Tracking data
Outline
02:00-02:15 Welcome, goals of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization I
~~~half-time~~~
03:45-04:00 Code demo: event-based visualization II
04:30-04:30 Code demo: tracking data visualization
04:30-05:00 Warp up and QA
Tools
Python modules
JavaScript libraries
Tableau Software
R
How to get sports data?
The Match Charting Project: Quick Start Guide�http://www.tennisabstract.com/blog/2015/09/23/the-match-charting-project-quick-start-guide/
Paper box score cards�So anyone can print and report statistics from games. Usually with gross regions information and needs time to write and digitalize.
How to get sports data?
<annotation>
<source>
<database>Unknown</database>
</source>
<size>
<width>2006</width>
<height>1504</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>AAA</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>185</xmin>
<ymin>600</ymin>
<xmax>695</xmax>
<ymax>978</ymax>
</bndbox>
</object>
</annotation>
Annotation tools�Enable to generate your own dataset based on observations. Tools such as LabelImg provide primitives to annotate in 2D or 3D and assign labels. It outputs XML/Json formats.
Labelimg�https://github.com/heartexlabs/labelImg
How to get sports data?
Automated tracking system�Systems that include a pipeline of transformations from videos to players positions, using Computer Vision and/or Deep Learning. Very well suited for tracking data (if position on the field has been calculated along the way).
SportsVu Basketball dataset
Players and ball trajectories for 631 games from the 2015-2016 NBA season�https://github.com/sealneaward/nba-movement-data
LastRow data sample
19 goals scored by Liverpool FC in 2019.�https://github.com/Friends-of-Tracking-Data-FoTD/Last-Row�
����
https://observablehq.com/@julesallegre/liverpool-goal-2019
Video from the game https://www.youtube.com/watch?v=RGFaJDAse1I&t=254s
StatsBomb data samples
Outline
02:00-02:15 Welcome, goals of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization I
~~~half-time~~~
03:45-04:00 Code demo: event-based visualization II
04:30-04:30 Code demo: tracking data visualization
04:30-05:00 Warp up and QA
Code demo A Step-by-step process using Observable Notebooks
https://observablehq.com/d/2d8e4e58280353bf
Part 1: Events data (on the ball): everything that happens, but not the positions of all the players
Shots and goals during the 2019 Female world cup semi-final between England and Sweden
Part 2: Position data (of players): show trajectory, animation and (simple) space occupation model
Second goal from Mané during Liverpool against Newcastle the 14th of september of 2019
Code demo: Events chart All shots or types of particular type across a season or during a particular time interval.
Case study: 2019 Female world cup semi-final between England and Sweden (3rd place play)
Questions:
��
https://www.youtube.com/watch?v=-Kkd7x-8VuI
Code demo: Events chart All shots or types of particular type across a season or during a particular time interval.
Step 1: look at the data
Step 2: define the spatial reference
Step 3: plot and filter by event/time
https://github.com/statsbomb/open-data�https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/69301.json
All sorts of events
(kickoff, ..)
Code demo: Events chart (with binning) �Binning enables to aggregate data within a particular region to reveal trends �(but also because data may not be highly accurate)
Step 4: define a binning function
Step 5: filter by events
Outline
02:00-02:15 Welcome, goals of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization I
~~~half-time~~~
03:45-04:00 Code demo: event-based visualization II
04:30-04:30 Code demo: tracking data visualization
04:30-05:00 Warp up and QA
Code demo: Events chart (small multiples) �Another approach to separate data (players, time, teams) is to use display divisions
Step 5: small multiples according to an attribute
⚠️ You need to preserve the pitch aspect ratio
Outline
02:00-02:15 Welcome, goals of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization I
~~~half-time~~~
03:45-04:00 Code demo: event-based visualization II
04:30-04:30 Code demo: tracking data visualization
04:30-05:00 Warp up and QA
Code demo: positional charts
Why did Firmino pass the ball to Mané and not to Salah?
(second goal from Mané during Liverpool against Newcastle the 14th of september of 2019)
Code demo: positional charts
Why did Firmino pass the ball to Mané and not to Salah?
Code demo: positional charts. From tracking data of players positions on a field for a give time (or frame rate)
Step 1: look at the data
Step 2: define the spatial reference
Step 3: animate
Step 4: include simple occupation model
Go further. Advanced occupation models
https://observablehq.com/@julesallegre/zones-data-from-conflict-zone-model-soccer-application
Python version https://colab.research.google.com/github/devinpleuler/analytics-handbook/blob/master/notebooks/pitch_dominance.ipynb
Javier Fernandez, Luke Bornn. Wide Open Spaces: A statistical technique for measuring space creation in professional soccer. MIT Sloan Conference 2018
Go further. Include perspective to situate the visualizations in a picture/video
Outline
02:00-02:15 Welcome and goal of this workshop
02:15-02:35 Motivation, examples
02:35-02:55 Tools, sports datasets
02:55-03:15 Code demo: event-based visualization
~~~half-time~~~
03:45-04:15 Code demo: chain-based and tracking data visualization
04:15-04:30 Code demo: draw with image perspective
04:30-05:00 Warp up and QA
Future data displays opportunities.
Future data acquisition opportunities.
Sports venues and journals.
workshops/conferences
computer vision/machine learning
journals
journal of quantitative analytics
Thank you!