1 of 59

Martial Arts Meets Machine Learning: Recognizing Judo Throws with MMAction2

June 2nd, 2023

2 of 59

Habeeb Shopeju

Research Engineer, Machine Learning
Thomson Reuters Labs
Interested in building Information Retrieval �and Machine Learning Systems
Lagos, Nigeria

3 of 59

A Judo Primer

Teddy Riner during a throw.

4 of 59

A Judo Primer

The Gentle Way
Mini-Glossary
Throw Examples
Stair-step Approach to a Throw
More Throw Examples

5 of 59

The Gentle Way

Created in 1882 by Jigoro Kano
Olympic sport since 1964

Influenced other martial arts
Sambo, Brazilian Jiu Jitsu

Maximum efficient use of energy
Mutual prosperity for self and others

6 of 59

Mini-Glossary

Hansoku-make: Defeat by grave infringement or accumulation of light penalties.

Ippon: Victory scored from throwing the opponent onto their back with impetus and control, holding them down for continuous 20 seconds, forcing a submission.

Judogi: The Judo uniform.

Judoka: A person who does Judo.

Waza: Judo techniques, e.g., ashi waza (foot technique), te waza (hand technique), koshi waza (hip technique).

Waza-ari: A score lesser to the ippon from throwing the opponent onto their back without impetus or control, holding them down for less than 10 seconds.

7 of 59

Mini-Glossary Cont’d

Randori: A free sparring drill.

Shido: A light penalty for a violation.

Uchikomi: Repeated practice of a throw, up till the point of execution but without completing the throw.

Uke: A player receiving the opponent’s attack.

Ukemi: The break fall techniques used to soften the landing impact after being thrown.

Tori: A player executing the attack.

8 of 59

Throw Examples

Uchi Mata

O Soto Gari

9 of 59

Stair-step Approach to a Throw

4. Execution (Kake)

3. Setting up (Tsukuri)

2. Off-Balancing (Kuzushi)

1. Grip (Kumi Kata)

10 of 59

More Throw Examples

Tomoe nage

Seoi nage

11 of 59

Action Recognition

12 of 59

Action Recognition

Object Detection
Object Detection for Action Recognition
Action Recognition
Models

13 of 59

Object Detection

Computer vision task of detecting objects in images or videos and categorizing them into classes.

Popular models:

YOLO
Faster R-CNN
RetinaNet

14 of 59

Object Detection for Action Recognition

15 of 59

Object Detection for Action Recognition

A: The door is being opened.

B: The door is being closed.

16 of 59

Object Detection for Action Recognition

A: The door is being opened.

B: The door is being closed.

17 of 59

Action Recognition

Computer vision task of recognizing human actions in images or videos and categorizing them into classes.
Challenges:

Temporal dynamics
Variety in spatial appearance
Context dependency

Popular models: C3D, X3D, TSN, TimeSformer

18 of 59

Models: C3D

Learning Spatiotemporal Features with 3D Convolutional Networks (CVPR’2014)

Key Ideas

3D Convolution and Pooling
Spatio-Temporal Feature Learning
C3D + Pooling + Fully Connected Layer + Linear Classifier

19 of 59

Models: TSN

Temporal Segment Networks for Action Recognition in Videos (ECCV’2016)

Key Ideas

Segment based sampling
Temporal Aggregation
Capture long-range temporal structure

20 of 59

Models: TimeSformer

Is Space-Time Attention All You Need for Video Understanding? (ICML’2021)

Key Ideas

Transformer architecture
Patch encoding
Divided attention

21 of 59

MMAction2

22 of 59

MMAction2

OpenMMLab Ecosystem
Config Fundamentals
Data Pipeline
Data Loader
Model
Other APIs
Training & Inference

23 of 59

OpenMMLab Ecosystem

24 of 59

OpenMMLab Ecosystem

25 of 59

Config Fundamentals: Key-value pairs

# random_config.py

dataset_type = 'VideoDataset'

data_root = 'data/kinetics400/videos_train'

file_client_args = dict(io_backend='disk')

train_pipeline = [

dict(type='DecordInit', **file_client_args),

dict(type='DecordDecode'),

dict(type='Resize', scale=(224, 224), keep_ratio=False),

dict(type='Flip', flip_ratio=0.5),

dict(type='PackActionInputs')

]

train_dataloader = dict(

batch_size=32, num_workers=8,

sampler=dict(type='DefaultSampler', shuffle=True),

dataset=dict(type=dataset_type, data_prefix=dict(video=data_root), pipeline=train_pipeline)

)

26 of 59

Config Fundamentals: Dot notation

>>> from mmengine.config import Config

>>> cfg = Config.fromfile('random_config.py')

>>> cfg.dataset_type

'VideoDataset'

>>> cfg.train_dataloader.dataset.data_prefix

{'video':'data/kinetics400/videos_train'}

>>> cfg.train_pipeline[2].scale

(224,224)

27 of 59

Config Fundamentals: Modular and inheritance design

# small_random_config.py

_base_ = ["random_config.py"]

dataset_type = "PoseDataset"

train_pipeline = _base_.train_pipeline

train_pipeline[2]["scale"] = (448, 448)

# Python IDLE

>>> cfg = Config.fromfile('small_random_config.py')

>>> cfg.dataset_type

'PoseDataset'

>>> cfg.train_pipeline[2]

{'type':'Resize','scale':(448,448),'keep_ratio':False}

>>> cfg.file_client_args

{'io_backend':'disk'}

28 of 59

Data Pipeline: Config

train_pipeline = [

dict(type='DecordInit', **file_client_args),

dict(type='UniformSample', clip_len=8, num_clips=1),

dict(type='DecordDecode'),

dict(type='Resize', scale=(-1, 256)),

dict(type='PytorchVideoWrapper', op='RandAugment', magnitude=7, num_layers=4),

dict(type='RandomResizedCrop'),

dict(type='Resize', scale=(224, 224), keep_ratio=False),

dict(type='Flip', flip_ratio=0.5),

dict(type='FormatShape', input_format='NCTHW'),

dict(type='PackActionInputs')

]

29 of 59

Data Pipeline: APIs

Video: DecordInit, DecordDecode, OpenCVInit, OpenCVDecode

Sampling: UniformSample, DenseSampleFrames, SampleFrames

Augmentation: CenterCrop, Flip, PytorchVideoWrapper, ImgAug

Others: FormatShape, PackActionInputs, PreNormalize2D, Transpose

30 of 59

Dataloader: Config

train_dataloader = dict(

batch_size=32,

num_workers=8,

persistent_workers=True,

sampler=dict(type='DefaultSampler', shuffle=True),

dataset=dict(

type=dataset_type,

ann_file=ann_file_train,

data_prefix=dict(video=data_root),

pipeline=train_pipeline

)

31 of 59

Model: Config

model = dict(

type='Recognizer2D',

backbone=dict(

type='ResNet',

pretrained='https://download.pytorch.org/models/resnet50-11ad3fa6.pth',

depth=50,

norm_eval=False),

cls_head=dict(

type='TSNHead',

num_classes=400,

in_channels=2048,

...),

...)

32 of 59

Model: APIs

Backbones: ResNet, UniFormer, X3D, VisionTransformer

Data Preprocessors: ActionDataPreprocessor, MultiModelDataPreprocessor

Heads: BaseHead, TSNHead, UniformerHead

Losses: BCELossWithLogits, CrossEntropyLoss

33 of 59

Other APIs

Evaluation Metrics: AccMetric, ConfusionMatrix

Hooks: CheckpointHook, VisualizationHook, LoggerHook

Optimizers: Support for all PyTorch optimizers

Schedulers: Support for most of PyTorch schedulers

34 of 59

Training & Inference

# Terminal

$ python mmaction2/tools/train.py random_config.py

# Python IDLE

>>> config_file = 'random_config.py'

>>> checkpoint_file = 'best_checkpoint.pth'

>>> video_file = "video.mp4"

>>> model = init_recognizer(config_file, checkpoint_file, device='cuda:0') # cpu

>>> pred_result = inference_recognizer(model, video_file)

35 of 59

Training a Model on Kinetics-Tiny

36 of 59

Details

Dataset: Kinetics400-Tiny

Curated by MMAction team
2 Classes
40 Clips

Notebook

Colab URL: https://bit.ly/ktiny-pydata

37 of 59

Installation and Configuration Files

38 of 59

Getting the Data

39 of 59

Config File

40 of 59

Config File

41 of 59

Training a model

42 of 59

Explaining Dimension Mismatch

43 of 59

Model Inference

44 of 59

Predicting Throw

45 of 59

Training a Model on Judo-Tiny; DIY

46 of 59

Details

Dataset: Judo-Tiny

Self-curated
7 Classes
500+ Clips

Notebook

Colab URL: https://bit.ly/judo-mmaction2

47 of 59

Installation

48 of 59

Downloading Configuration and checkpoint files

49 of 59

Getting the data

50 of 59

Getting the data

51 of 59

Getting the data

52 of 59

Config File

53 of 59

Config File

54 of 59

Config File

55 of 59

Training a model

56 of 59

ImgAug Documentation

57 of 59

Loading the Model

58 of 59

Predicting throw

59 of 59

Thank You