1 of 59

Martial Arts Meets Machine Learning: Recognizing Judo Throws with MMAction2

June 2nd, 2023

2 of 59

Habeeb Shopeju

  • Research Engineer, Machine Learning
  • Thomson Reuters Labs
  • Interested in building Information Retrieval �and Machine Learning Systems
  • Lagos, Nigeria

3 of 59

A Judo Primer

Teddy Riner during a throw.

4 of 59

A Judo Primer

  • The Gentle Way
  • Mini-Glossary
  • Throw Examples
  • Stair-step Approach to a Throw
  • More Throw Examples

5 of 59

The Gentle Way

  • Created in 1882 by Jigoro Kano
  • Olympic sport since 1964

  • Influenced other martial arts
  • Sambo, Brazilian Jiu Jitsu

  • Maximum efficient use of energy
  • Mutual prosperity for self and others

6 of 59

Mini-Glossary

  • Hansoku-make: Defeat by grave infringement or accumulation of light penalties.

  • Ippon: Victory scored from throwing the opponent onto their back with impetus and control, holding them down for continuous 20 seconds, forcing a submission.

  • Judogi: The Judo uniform.

  • Judoka: A person who does Judo.

  • Waza: Judo techniques, e.g., ashi waza (foot technique), te waza (hand technique), koshi waza (hip technique).

  • Waza-ari: A score lesser to the ippon from throwing the opponent onto their back without impetus or control, holding them down for less than 10 seconds.

7 of 59

Mini-Glossary Cont’d

  • Randori: A free sparring drill.

  • Shido: A light penalty for a violation.

  • Uchikomi: Repeated practice of a throw, up till the point of execution but without completing the throw.

  • Uke: A player receiving the opponent’s attack.

  • Ukemi: The break fall techniques used to soften the landing impact after being thrown.

  • Tori: A player executing the attack.

8 of 59

Throw Examples

Uchi Mata

O Soto Gari

9 of 59

Stair-step Approach to a Throw

4. Execution (Kake)

3. Setting up (Tsukuri)

2. Off-Balancing (Kuzushi)

1. Grip (Kumi Kata)

10 of 59

More Throw Examples

Tomoe nage

Seoi nage

11 of 59

Action Recognition

12 of 59

Action Recognition

  • Object Detection
  • Object Detection for Action Recognition
  • Action Recognition
  • Models

13 of 59

Object Detection

  • Computer vision task of detecting objects in images or videos and categorizing them into classes.

  • Popular models:
    • YOLO
    • Faster R-CNN
    • RetinaNet

14 of 59

Object Detection for Action Recognition

15 of 59

Object Detection for Action Recognition

A: The door is being opened.

B: The door is being closed.

16 of 59

Object Detection for Action Recognition

A: The door is being opened.

B: The door is being closed.

17 of 59

Action Recognition

  • Computer vision task of recognizing human actions in images or videos and categorizing them into classes.
  • Challenges:
    • Temporal dynamics
    • Variety in spatial appearance
    • Context dependency

  • Popular models: C3D, X3D, TSN, TimeSformer

18 of 59

Models: C3D

Learning Spatiotemporal Features with 3D Convolutional Networks (CVPR’2014)

Key Ideas

  • 3D Convolution and Pooling
  • Spatio-Temporal Feature Learning
  • C3D + Pooling + Fully Connected Layer + Linear Classifier

19 of 59

Models: TSN

Temporal Segment Networks for Action Recognition in Videos (ECCV’2016)

Key Ideas

  • Segment based sampling
  • Temporal Aggregation
  • Capture long-range temporal structure

20 of 59

Models: TimeSformer

Is Space-Time Attention All You Need for Video Understanding? (ICML’2021)

Key Ideas

  • Transformer architecture
  • Patch encoding
  • Divided attention

21 of 59

MMAction2

22 of 59

MMAction2

  • OpenMMLab Ecosystem
  • Config Fundamentals
  • Data Pipeline
  • Data Loader
  • Model
  • Other APIs
  • Training & Inference

23 of 59

OpenMMLab Ecosystem

24 of 59

OpenMMLab Ecosystem

25 of 59

Config Fundamentals: Key-value pairs

# random_config.py

dataset_type = 'VideoDataset'

data_root = 'data/kinetics400/videos_train'

file_client_args = dict(io_backend='disk')

train_pipeline = [

dict(type='DecordInit', **file_client_args),

dict(type='DecordDecode'),

dict(type='Resize', scale=(224, 224), keep_ratio=False),

dict(type='Flip', flip_ratio=0.5),

dict(type='PackActionInputs')

]

train_dataloader = dict(

batch_size=32, num_workers=8,

sampler=dict(type='DefaultSampler', shuffle=True),

dataset=dict(type=dataset_type, data_prefix=dict(video=data_root), pipeline=train_pipeline)

)

26 of 59

Config Fundamentals: Dot notation

>>> from mmengine.config import Config

>>> cfg = Config.fromfile('random_config.py')

>>> cfg.dataset_type

'VideoDataset'

>>> cfg.train_dataloader.dataset.data_prefix

{'video':'data/kinetics400/videos_train'}

>>> cfg.train_pipeline[2].scale

(224,224)

27 of 59

Config Fundamentals: Modular and inheritance design

# small_random_config.py

_base_ = ["random_config.py"]

dataset_type = "PoseDataset"

train_pipeline = _base_.train_pipeline

train_pipeline[2]["scale"] = (448, 448)

# Python IDLE

>>> cfg = Config.fromfile('small_random_config.py')

>>> cfg.dataset_type

'PoseDataset'

>>> cfg.train_pipeline[2]

{'type':'Resize','scale':(448,448),'keep_ratio':False}

>>> cfg.file_client_args

{'io_backend':'disk'}

28 of 59

Data Pipeline: Config

train_pipeline = [

dict(type='DecordInit', **file_client_args),

dict(type='UniformSample', clip_len=8, num_clips=1),

dict(type='DecordDecode'),

dict(type='Resize', scale=(-1, 256)),

dict(type='PytorchVideoWrapper', op='RandAugment', magnitude=7, num_layers=4),

dict(type='RandomResizedCrop'),

dict(type='Resize', scale=(224, 224), keep_ratio=False),

dict(type='Flip', flip_ratio=0.5),

dict(type='FormatShape', input_format='NCTHW'),

dict(type='PackActionInputs')

]

29 of 59

Data Pipeline: APIs

  • Video: DecordInit, DecordDecode, OpenCVInit, OpenCVDecode

  • Sampling: UniformSample, DenseSampleFrames, SampleFrames

  • Augmentation: CenterCrop, Flip, PytorchVideoWrapper, ImgAug

  • Others: FormatShape, PackActionInputs, PreNormalize2D, Transpose

30 of 59

Dataloader: Config

train_dataloader = dict(

batch_size=32,

num_workers=8,

persistent_workers=True,

sampler=dict(type='DefaultSampler', shuffle=True),

dataset=dict(

type=dataset_type,

ann_file=ann_file_train,

data_prefix=dict(video=data_root),

pipeline=train_pipeline

)

)

31 of 59

Model: Config

model = dict(

type='Recognizer2D',

backbone=dict(

type='ResNet',

pretrained='https://download.pytorch.org/models/resnet50-11ad3fa6.pth',

depth=50,

norm_eval=False),

cls_head=dict(

type='TSNHead',

num_classes=400,

in_channels=2048,

...),

...)

32 of 59

Model: APIs

  • Backbones: ResNet, UniFormer, X3D, VisionTransformer

  • Data Preprocessors: ActionDataPreprocessor, MultiModelDataPreprocessor

  • Heads: BaseHead, TSNHead, UniformerHead

  • Losses: BCELossWithLogits, CrossEntropyLoss

33 of 59

Other APIs

  • Evaluation Metrics: AccMetric, ConfusionMatrix

  • Hooks: CheckpointHook, VisualizationHook, LoggerHook

  • Optimizers: Support for all PyTorch optimizers

  • Schedulers: Support for most of PyTorch schedulers

34 of 59

Training & Inference

# Terminal

$ python mmaction2/tools/train.py random_config.py

# Python IDLE

>>> config_file = 'random_config.py'

>>> checkpoint_file = 'best_checkpoint.pth'

>>> video_file = "video.mp4"

>>> model = init_recognizer(config_file, checkpoint_file, device='cuda:0') # cpu

>>> pred_result = inference_recognizer(model, video_file)

35 of 59

Training a Model on Kinetics-Tiny

36 of 59

Details

Dataset: Kinetics400-Tiny

  • Curated by MMAction team
  • 2 Classes
  • 40 Clips

Notebook

  • Colab URL: https://bit.ly/ktiny-pydata

37 of 59

Installation and Configuration Files

38 of 59

Getting the Data

39 of 59

Config File

40 of 59

Config File

41 of 59

Training a model

42 of 59

Explaining Dimension Mismatch

43 of 59

Model Inference

44 of 59

Predicting Throw

45 of 59

Training a Model on Judo-Tiny; DIY

46 of 59

Details

Dataset: Judo-Tiny

  • Self-curated
  • 7 Classes
  • 500+ Clips

Notebook

  • Colab URL: https://bit.ly/judo-mmaction2

47 of 59

Installation

48 of 59

Downloading Configuration and checkpoint files

49 of 59

Getting the data

50 of 59

Getting the data

51 of 59

Getting the data

52 of 59

Config File

53 of 59

Config File

54 of 59

Config File

55 of 59

Training a model

56 of 59

ImgAug Documentation

57 of 59

Loading the Model

58 of 59

Predicting throw

59 of 59

Thank You