1 of 141

MLCommons®

Community Meeting 1Q23

April 20, 2023

2 of 141

This community meeting is being recorded and will be shared

2

3 of 141

Schedule

3

9:00 AM

Breakfast

9:30 AM

VMware Welcome: Sujata Banerjee

9:45 AM

MLC Welcome: Peter Mattson

10:00 AM

MLC Update: David Kanter

10:30 AM

Break

10:50 AM

Working Group Update

12:20 PM

Lunch

1:20 PM

Power WG Showcase

1:35 PM

DataPerf WG Showcase

1:50 PM

Group discussions (in person only)

I Benchmark value to enterprise customers: getting involved / moderator: Debojyoti Dutta

II Datasets for model quality benchmarking - e.g. which is the best LLM? / moderator: Kurt Bollacker

III MLCommons research: how do we deliver value for researchers? / moderator: Vijay Janapa Reddi

3:20 PM

Cake Break

3:45 PM

Social hour

4:45 PM

End

4 of 141

Welcome

4

5 of 141

5

What is the MLCommons Assocation?

6 of 141

6

ML/AI has huge potential to benefit everyone

  • Information access�
  • Health

  • Safety�
  • Human productivity

7 of 141

MLCommons is building the ML ecosystem

7

Mission

AI / ML Ecosystem

Pillars

Benchmarks

Best practices

Research

Data

AI / ML Ecosystem

Wright brothers: public domain; Planes: Marek Ślusarczyk

Community

8 of 141

Working groups

8

9 of 141

9

Benchmarks

10 of 141

ML needs benchmarks for everything

10

ML component

Metrics

MLCommons WG

Hardware

Speed/efficiency

MLPerf WGs

Software (compiler + runtime)

Speed/efficiency

MLPerf WGs

Model

Accuracy/efficiency

AlgoPerf WG

Training algorithm

Accuracy/efficiency

AlgoPerf WG

Data

Accuracy/efficiency

DataPerf WG

Solution

Accuracy/safety

MedPerf WG

Automotive task force

11 of 141

11

Data

12 of 141

Data is the new code.

Data defines best possible functionality.

The model is a lossy compiler.

12

13 of 141

Modern ML is built on public datasets

13

Public datasets are the language of ML research …

Even for the largest of ML focused companies…

14 of 141

But ML is evolving

14

15 of 141

How do we develop better datasets?

15

Community

Datasets

AGI train and test

Industry / tool R&D

Infrastructure

Public Good

Tools

Metrics

Funding

$ € ¥ …

Neurips

ICML?...

Venues + incentives

🏆

People + shared vision

!!!

16 of 141

16

Challenges

17 of 141

ML/AI is taking off

17

“AI” search interest over time

18 of 141

We are driving 200mph…while building the road

18

Photos: unsplash

19 of 141

Concretely

Rapid changes

ML-deployed-in-verticals

LLMs

Quality benchmarks

Datasets

Industrial use at academic pace

Org challenges

Member/community growth

Staffing/processes maturity

Membership model

19

20 of 141

20

Getting involved

21 of 141

We need more smart people!

21

22 of 141

22

Values

23 of 141

Values (https://mlcommons.org/en/philosophy/)

  • Grow ML markets and make the world a better place
  • Act through collaborative engineering
  • Get everyone involved
  • Make fast but consensus-supported decisions
  • Build a community that people want to be part of

23

Photos: unsplash

24 of 141

MLCommons Update

24

25 of 141

MLCommons is Growing our Staff

  • Director of Marketing
  • Product Manager
  • Sys admin
  • Lead for MedPerf
  • Tech writers
  • New software engineering firm and mobile Eng
  • Tech lead for autonomous driving

Welcome aboard - excited for your contributions!

25

26 of 141

Q1 Accomplishments

26

27 of 141

MLPerf™ Inference v3.0 Results Overview

  • Results: MLPerf Inference v3.0 Results (Embargoed until 4/5/23 @ 10am Pacific)
    • Over 6,700 performance results
    • >2,400 power measurement results

  • Performance: Alibaba, ASUSTeK, Azure, cTuning, Deci, Dell, GIGABYTE, H3C, HPE, Inspur, Intel, Krai, Lenovo, Moffett, Nettrix, Neuchips, Neural Magic, NVIDIA, Qualcomm, Quanta Cloud Technology, rebellions, SiMa, Supermicro,VMware, xFusion

  • Power: Alibaba, cTuning, Dell, HPE, KRAI, Lenovo, NEUCHIPS, NVIDIA, Qualcomm, SiMa

  • Inference over Network: HPE, NVIDIA, Qualcomm

  • New submitters in bold

27

28 of 141

MLPerf Inference Trends

  • Lots of new hardware systems
  • Increasing performance in the datacenter
    • Over 30%+ in some benchmarks since MLPerf Inference v2.1

  • Increasing emphasis on power efficiency
    • 50% increase in number of submitters measuring power efficiency

  • More interest in Inference over the network (3X increase)

  • Open data center - nearly 3X more submissions
    • Wide variety of techniques: distillation, sparsification, new models

  • New MLPerf Mobile app available for Android and iOS, contact for access

28

29 of 141

Press Coverage

  • 60+ Stories
  • Good mix of coverage across Tech Press, Broad media (Forbes)
  • Local media pickup of press release
  • Full spreadsheet of articles here

One of the best ways the AI/ML industry has today for measuring performance is with the MLPerf set of testing benchmarks, which have been developed by the multi-stakeholder MLCommons organization.”

Venture Beat

“This round featured even greater participation across the community with a record-breaking 25 submitting organizations, over 6,700 performance results, and more than 2,400 performance and power efficiency measurements.”

Yahoo! Finance

“Peter Rutten, VP infrastructure systems, IDC, said, “[MLPerf 3.0] is especially helpful because of the huge differences between all the systems in terms of performance and power consumption [and] the software that each system deploys to optimize the performance. Having the ability to compare all these systems in an objective way that is supported by most of the AI industry is allowing us to see how vendors compare”.

Enterprise AI

30 of 141

1Q23 MLCommons Hero Awards

30

Pablo Gonzalez Mesa:

Heroically landing MLPerf Inference despite many challenges and being awesome

Lilith Bat-Leah: Amazing volunteer spirit, building the DataPerf webpage, outreach, ICML workshop, and tireless organization

Oana Balmau:

Superb leadership, dedication, and enthusiasm for MLPerf Storage

31 of 141

Kelly Berschauer (Marketing)

ROLE: Director of Marketing

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Marketing Director with 20+ years of experience across Microsoft, Meta and Truveta a healthcare startup
  • Spent 6 years doing marketing for Microsoft Research and created the Facebook Research brand
  • Have lived and worked in Seattle, London and the Silicon Valley
  • In my spare time you can find me travelling, kayaking in Lake Union or gardening on Whidbey Island
  • Trivia ?: In the early 80’s I occasionally hung out with a now-deceased grunge icon in Grays Harbor County, WA. �Who was it?

31

32 of 141

Nathan Wasson (IT)

ROLE: MLCommons Systems Administrator, Auditor, & Video Editor

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Writer and technical services provider; started producing and editing written, audio, and video content at The Tech Report but has moved on since its demise
  • Published in The Tech Report, HotHardware, and RETURN
  • Has troubleshooted technical issues from clients’ houses to the offices of the U.S. Congress
  • Internet privacy and security enthusiast
  • Takes weekend respites from the digital world to hit cars with wrenches

32

33 of 141

David Tafur (Product Management)

ROLE: Product Manager

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • 6+ years in Product Management & Business Strategy across corporations and startups (Banking, Cosmetics, B2B)
  • Product Consultant for US & South American companies
  • Global citizen: Born in Peru, lived in USA, Australia, Costa Rica, and Brazil
  • Enthusiast of traveling, surfing, swimming, and ukulele

33

34 of 141

Sally Doherty (Board of Directors)

ROLE: Board Member & Finance Committee Chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Chief Marketing Office at AI compute company Graphcore, responsible for GTM, performance marketing, product marketing, communications and brand
  • 30+ years of experience in technology marketing, with experience over the years across Nvidia, Sony Computer Entertainment & start ups like cellular communication firm, Icera
  • When I’m not at work or walking my Irish Setter, you’ll find me cooking, learning about gardening, eating out or at the theatre

34

35 of 141

Weiming Zhao (Board of Directors)

ROLE: Director of Marketing

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • As an architect at Alibaba, my primary responsibility is to oversee the the build of AI infrastructure. This includes evaluating and enabling cutting edge AI hardware and optimizing software to ensure maximum performance and efficiency.
  • 10+ years of experience in system software including virtualization and compiler optimization.
  • Enjoy the hobbies of reading, playing badminton and jogging

35

36 of 141

Kurt Bollacker (Datasets)

ROLE: Datasets WG Chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Digital Research Director at the Long Now Foundation
  • Areas of Research: machine learning, search engines, graph databases, digital archiving, and electro-cardiographic simulation
  • Public Datasets I’ve created or helped start: CiteseerX, Internet Archive, Rosetta Project, Freebase (Google KG), Sleep and Dream Database
  • I bake cakes most every Friday evening. I try to never repeat a recipe.

36

37 of 141

Andreas Prodromou (HPC)

ROLE: HPC WG Co-chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Senior Deep Learning Scientist in NVIDIA
  • Works in NVIDIA’s DL engineering team, focusing on DL accelerator architectures.
  • Highlights:
    • Involved with HPC WG as a participant/contributor for two years.
    • In-depth familiarity with a wide range of DL models, accelerators and frameworks. Hands-on experience in deploying state of the art AI models.
  • Committed to perpetual self-growth, via a semi-random exploration of new interests. Ask for more info if interested.

37

38 of 141

Juri Papay (Science)

ROLE: Science WG Co-chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Senior Data Scientist at STFC-RAL
  • Since my student years I have been interested in measuring the performance of parallel computers.
  • Worked on over twenty EU and UK funded projects, covering a wide range of topics such as benchmarking of HPC, security modelling and semantic research.
  • At STFC-RAL I work on my favourite topic of benchmarking machine learning applications and investigating the performance of large scale GPU systems.

38

39 of 141

Ritika Borkar (Training)

ROLE: Training WG Co-chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Senior Deep Learning Architect at NVIDIA
  • Work in Compute Architecture Team with focus on HW & SW optimizations for High Performance AI Computing on GPUs and datacenter systems
  • Also serve as Board Member at MLCommons
  • Been involved with MLPerf Training for more than 3 years
  • Can’t get enough of Pacific Northwest. Hit me up if you are ever in Portland area and need recommendations for good food or hikes!

39

40 of 141

Max Bartolo (Dynabench)

ROLE: Dynabench WG Co-chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Tech Lead for Command models at Cohere and Adjunct Teaching Fellow at University College London (UCL)
  • Works on improving the robustness and overall capabilities of conversational instruction-following large language models
  • Previously spent time at Satalia, Bloomsbury AI, Meta AI and DeepMind
  • One of the original Dynabench contributors and creator of the ShARC and AdversarialQA datasets
  • Enjoys football, martial arts, tennis, hiking, diving & more
  • Trivia?: I appear (for a few seconds) in Game of Thrones. Which episode?

40

41 of 141

Wei Zhao (Mobile)

ROLE: Mobile WG Co-chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Director of Technical Marketing at Zeku
  • Works on technology planning, business development and ecosystem partnerships
  • Ph.D. on ECE from University and Maryland College Park
  • 10+ years of industry experience on mobile and AI
  • Held system engineering and product management roles at corporation and startup
  • Enjoy soccer, movies and traveling in spare time

41

42 of 141

Mostafa El-Khamy (Mobile)

ROLE: Mobile WG Co-chair

BACKGROUND AND A LITTLE BIT ABOUT ME:

  • Sr. Principal Engineer at Samsung Device Solutions Research America
  • Leads R&D for AI multimedia systems and AI Benchmarking @Samsung
  • Ph.D. in EE from the California Institute of Technology (Caltech), MS and BSc in EE from Alexandria University, MBA from Edinburgh Business School.
  • Previously worked at Qualcomm CRD, and faculty member of Alexandria University and Egypt Japan Univ. of Science and Technology
  • Erdős number is 2 (Paul Erdős -> Robert McEliece -> M. El-Khamy)
  • Enjoys being on the water (fishing/sailing/diving)

42

43 of 141

43

It would not be possible without our members

Founding Members

Academics from educational institutions including:

Harvard University

Polytechnique Montreal

Peng Cheng Laboratory

Stanford University

University of California, Berkeley

University of Toronto

University of Tübingen

University of Virginia

University of York, United Kingdom

Yonsei University

York University, Canada

Members

44 of 141

Break

44

45 of 141

Schedule

45

9:00 AM

Breakfast

9:30 AM

VMware Welcome: Sujata Banerjee

9:45 AM

MLC Welcome: Peter Mattson

10:00 AM

MLC Update: David Kanter

10:30 AM

Break

10:50 AM

Working Group Update

12:20 PM

Lunch

1:20 PM

Power WG Showcase

1:35 PM

DataPerf WG Showcase

1:50 PM

Group discussions (in person only)

I Benchmark value to enterprise customers: getting involved / moderator: Debojyoti Dutta

II Datasets for model quality benchmarking - e.g. which is the best LLM? / moderator: Kurt Bollacker

III MLCommons research: how do we deliver value for researchers? / moderator: Vijay Janapa Reddi

3:20 PM

Cake Break

3:45 PM

Social hour

4:45 PM

End

46 of 141

Working Group Updates

46

47 of 141

Working Group Updates

  • Goal is a brief overview of each working group
  • 5 minutes per working group (WG), slides follow a fairly fixed format
  • Not a lot of time for audio questions
    • Follow up with WG chairs via email, chat, or in-person

  • MLCommons WG Roadmaps and WG OKRs offer a snapshot
    • Feedback on them welcome, both format and contents

47

48 of 141

Mobile

48

EXAMPLE

49 of 141

Mobile Group

WG Purpose:

  • Develop a performance-accuracy benchmark suite for consumer mobile devices (phones & laptop) with different AI stacks

Goal:

  • Allowing general public to examine the AI performance of their devices through the MLPerf Mobile benchmark app

49

EXAMPLE

50 of 141

Updates from Last Quarter (change title!)

  • REMEMBER YOU HAVE 5 MINUTES TOTAL, BE BRIEF
  • Goal is to share with the community what is going on, and get people interested, or able to help

  • David will talk about major things like benchmark releases in the first part of the presentation (most likely)

  • Give us a few bullet points about what you accomplished in the last quarter

  • Keep to

50

EXAMPLE

51 of 141

What’s Next (1Q and 4Q)? (Change title!)

  • REMEMBER YOU HAVE 5 MINUTES TOTAL, BE BRIEF
  • Tell us what is happening in the next 3-12 months

  • You can use the timeline in the next slide if it helps

  • What are your top 2-3 challenges?
    • E.g., need owner for a reference model?

  • What are your top community asks?
    • What can MLC do to help your WG?

51

EXAMPLE

52 of 141

What’s ahead

52

Feb 2022 v2.0

New features and support expansion

  • New segmentation model

Aug 2022 v2.1

Cross Platform Enablement

  • Porting Android implementation to flutter
  • Core ML support

Mar 2023 v3.0

New features and Cross platform support

  • New SR model
  • Windows Flutter support

Aug 2023 v3.1

Increase adoption

  • Internal launch of the score collection website
  • Having MLPerf app to use a default runtime on all mobile devices

EXAMPLE

53 of 141

Mobile

53

54 of 141

Mobile Group

WG Purpose:

  • Develop a performance-accuracy benchmark suite for consumer mobile devices (phones & laptop) with different AI stacks

Goal:

  • Allowing general public to examine the AI performance of their devices through the MLPerf Mobile benchmark app

54

55 of 141

Updates since last Community Event

  • MLPerf Mobile v3.0 submission
    • New Super Resolution Model from Seoul National University
      • Add diversity (newer and larger model)
      • Finding a usable dataset was challenging
    • Submission for Android & Windows platforms

  • MLPerf v3.0 app now available for download
    • Including the new SR model
    • Adding support for Mediatek Dimensity 9200 and Qualcomm Snapdragon 8 Gen2 & 7+ Gen2

55

56 of 141

Upcoming Features

  • CI/CD pipeline
    • Helps to expedite the app development process

  • New models for v3.1/v4.0
    • Replacing older models such as mobilenetEdgeTPU

  • Default runtime
    • Ensure the legacy/low-tier platforms not supported by the vendors will still generate benchmark scores - important for press

  • Data collection
    • Collect user-generated scores
      • Great reference data & website traffic generator

56

57 of 141

What’s ahead

57

Mar 2023 v3.0

New Features and Cross Platform Support

  • New SR model
  • Expand Windows coverage

Aug 2023 v3.1

Working Towards Default Runtime and Data Collection

  • Model Updates
  • First Windows benchmark app
  • Internal launch of the score collection website

Aug 2022 v2.1

Cross Platform Enablement

  • Porting Android implementation to flutter
  • Core ML support

Mar 2024 v4.0

Increase Benchmark Coverage and Adoption

  • Official Launch of the score collection website
  • Default runtime available for all mobile devices

58 of 141

Autonomous Driving Benchmark

58

59 of 141

Autonomous Driving Benchmark Group

Purpose:

  • Develop a benchmark for a representative automotive task for both training and inference.

Goal:

  • Add a training/inference multimodal 3D object detection benchmark.

59

60 of 141

Updates

  • Dataset, accuracy metrics and high level model are settled.
  • Waymo Open Dataset.
  • Average Precision Heading.
  • PointPainting model.

  • Working implementation of PointPainting with samples of dataset.
    • Need help with compute resources for full dataset.
  • Longer Term:
    • Find cochair, we are still looking for those interested.
    • Settle details of model.
    • Train on full dataset.
    • Determine inference scenario.
    • Targeting MLPerf Training 4.0 and Inference 4.1.

60

61 of 141

Automotive Benchmarking Task Force

61

62 of 141

Automotive Benchmarking Task Force

Background:

  • MLCommons and AVCC (Autonomous Vehicle Compute Consortium) are collaborating on creating an ML benchmark suite for automotive
    • MLCommons knows ML
    • AVCC knows automotive

Purpose:

  • Solve the current non-alignment in ML compute performance measurement in the automotive supply chain

Goal:

  • Define and develop an automotive industry standard ML benchmark suite to be used in RFIs/RFQs

62

63 of 141

Updates

  • Successful kickoff in end of February
    • 30+ attendees
  • Joint MLC and AVCC MoU close to being finalized
    • Will be publicly announced
  • Proposed timeline

  • Gathering requirements which will define the specification
    • The “MVP Demo” will implement a subset of the specification

63

64 of 141

Algorithms

64

65 of 141

Algorithms Working Group

WG Purpose:

  • Create a set of rigorous and relevant benchmarks to measure neural network training speedups due to algorithmic improvements.

Specific Goals:

  • AlgoPerf Training Algorithm Benchmark
  • [Future] AlgoPerf Model Benchmark

65

66 of 141

Updates from Last Quarter

  • Putting the finishing touches on our codebase.
    • Bugfixing
    • Implementing workload variants
    • Implementing baseline submissions
  • Wrote a draft for the paper introducing the rules of the AlgoPerf Training Algorithms Benchmark.

66

67 of 141

What’s Next?

Short Term:

  • Publish paper introducing the rules for the AlgoPerf Training Algorithms Benchmark.
  • Publish a Call for Submission for the Benchmark.
    • Blog posts
    • Social media posts
    • Provide support for potential submitters

Long Term:

  • Publish results of the AlgoPerf Training Algorithms Benchmark.
  • Plan next iteration of this benchmark.
  • Build the AlgoPerf Model Benchmark.

67

68 of 141

Best Practices

68

69 of 141

Best Practices Working Group

Purpose:

  • Improve portability and reproducibility of ML projects, workloads and benchmarks. The initial starting point is the MLCubeTM project that provides specifications and reference implementations to achieve this.

Goal:

  • Develop specifications for packaging ML projects, workloads and benchmarks as OCI (Open Container Initiative) containers.
  • Develop MLCube ecosystem of tools (reference runners for diverse environments, project templates for bootstrapping new MLCubes for various languages, example MLCubes).
  • Package MLPerf benchmarks, support MLCommons competitions, promote this technology in industry and academia.

69

70 of 141

Updates from Last Quarter

  • New version of MLCube 0.0.9.
  • Training Reference Models MLCubed (Retinanet, BERT).
  • Documentation updates
  • Optimizing test environments and project dependencies
    • New environment (OS and python version) for test workflows).
    • Removing redundant dependencies.
    • Splitting dependencies into production, test and development dependencies.
    • Upgrading project dependencies to newer versions.
  • Support for `~` and environment variables (e.g., `HOME`) in task parameters.
  • New CLI arguments for Docker and Singularity runners (--network, --security, --gpus, --memory, --cpu).

70

71 of 141

What’s Next?

Promote MLCube

  • Finalize a paper that surveys approaches that enable portability and reproducibility of ML projects, introduces and positions MLCube.
  • Develop MLCube tutorials, meet with select partners in industry and academia.

Support MLPerf benchmarks and MLCommons competitions:

  • Package multiple MLPerf reference training benchmarks.
  • Support MedPerf and DataPerf competitions.

New features:

  • Self-contained MLCube containers.
  • Improved user experience.
  • Python API.

71

72 of 141

Medical

72

https://mlcommons.org/en/groups/research-medical/

73 of 141

Medical Working Group

WG Purpose:

  • Develop and support Medical AI benchmarks in global real-world clinical settings

Goals:

  • Develop MedPerf to support access to federated datasets for secure and privacy-preserving workload execution
  • Develop GaNDLF to support zero-low code ML workloads
  • Research and develop benchmarks with clinical impact (e.g., bias, health equity, etc.)

73

74 of 141

Updates

  • MedPerf paper accepted for publication at Nature Machine Intelligence
    • 65 co-authors
    • 20+ companies, 20+ academic institutions and 10 hospitals across 13 countries
    • Big thanks to Dana Farber, Intel, Nutanix (Debo Dutta), UPenn and, of course, MLCommons community
  • GaNDLF paper accepted for publication at Nature Communications Engineering
    • 16+ research groups both industry and academia
  • Developed MedPerf <-> Synapse interface: a) Orchestrator and b) Provides private docker registry and compute (BraTS/FeTS challenge)
  • Better user experience: (New commands)
    • To view server assets (#369)
    • To create MedPerf-compatible MLCube templates (#396)
    • To prepare MLCubes for submission to the server (#413)
  • New features:
    • Support offline execution (#400)
    • Flexible file hosting requirements: Support private file hosting on the Synapse platform (#378)
  • Enhanced documentation: New hands-on tutorials (#370, #385)

74

75 of 141

What’s next?

  • Support BraTS/FeTS 2023 (70+ hospitals)
  • Provide an interface for interfacing with Federated Training frameworks to train models
  • Data Layer of MedPerf:
    • XNAT integration support for better data ingestion
    • More flexible and controlled Data preparation MLCube tasks
  • GaNDLF improvements:
    • Model differential privacy training
    • Support for multiple medical data types (beyond radiology and pathology data)
    • Support for generative models (GANs, diffusion models)
  • Challenges
    • Identify sustainability model
    • Research Grants (NIH, NSF)
        • Subcontract or partner in consortia (hard to identify them, need strategy)
    • Foundations (specific disease)
    • Support Healthcare Stakeholders (e.g., clinical validation of AI )

75

76 of 141

Tiny

76

77 of 141

Tiny Overview

Tiny Working Group

What we are

  • A benchmark suite for ultra-low-power ML systems (TinyML)
  • On-device real-time batch-of-one inference.
  • Measure energy/inference and latency on 4 different models

77

Typical Systems�

  • MCUs, some accelerators
  • 10s-100s MHz
  • ⪍ MB Flash, SRAM
  • ~mW Power

  • Lightweight Models (<1M Param )

Keyword Spotting

Visual Wake Words

Anomaly Detection

Image Classification

DS-CNN

MobileNet v1

FC Autoencoder

ResNet8

52 kPar

325 kPar

270 kPar

96 kPar

Current Benchmarks

78 of 141

Updates

Tiny Working Group

  • Latest Round
    • Published November 9
    • Submitting Organizations: 8
      • Including 3 new submitters
    • Systems Submitted: 17
      • 11 w/ Energy
    • Good variety of hardware represented: Arm, RISC, FPGA, Custom Accelerators, and combinations

78

79 of 141

What’s Next

Tiny Working Group

  • New benchmarks in the works:
    • Streaming Audio Benchmark LSTM-based denoiser
      • Sustained inference on a continuous time-series
      • Exercise rapid duty-cycling for energy efficiency
      • Add RNN to the benchmark suite
    • Others under consideration

  • Next Submission Round v1.1
    • May 19 submission / June 21 publication

  • Join us! Mondays at 12:05 ET during winter/spring 2023

(Will revert to normal time of 12:05 ET in June)

79

80 of 141

Datasets

80

Kurt Bollacker

2023 April 20

81 of 141

Datasets Working Group

WG Purpose:

  • Create new datasets to fuel innovation in machine learning

Specific Goals:

  • Create impactful datasets without licensing encumbrances.
  • Host datasets with affordable access for everyone
  • Create tooling to scale the creation of new datasets and improvement of existing ones

81

82 of 141

Recent Release

Speech Wikimedia (March 2023)

    • Compilation of 1,500 hours of multilingual audio files (with transcriptions) extracted from Wikimedia commons (CC and PD). A larger, unsupervised dataset is in process.

Last Quarter

People’s Speech Update

    • 30K hours of aligned,diverse speech. V1.1 brings Faster download, higher quality, and better docs (A tutorial!)

Dollar Street Dataset

    • House item image dataset that contains novel geographic and incoming information from underrepresented parts of the world.

82

83 of 141

Challenges in

Dataset Creation

83

  • Stable versioning vs freshness (datasets go stale and become less relevant)
  • Metadata management is hard and inconsistent (tags? new item labels?)
  • Scaling is a problem (slow/expensive to move large data or have many collaborators)
  • Licensing and Governance are hard
  • Communities need to grow around and nurture datasets

A new project: A Dataset Service for collaboration

84 of 141

Dataset Service: What will it look like?

84

First focus on the infrastructure to build a “Git for Data” service that supports:

  • Scalability to large datasets and many, distributed collaborators.
  • Support for structured metadata at dataset, item, and subset granularities
  • Fast versioning and forking of datasets and subsets.
  • Easy distributed, collaborative contribution.
  • Fast discovery and sharing of data (sub)sets.

85 of 141

Join The Datasets Working Group!

https://mlcommons.org/en/groups/datasets/

Google group link: Datasets Google Group

  • Join group to be
    • 1. invited to the weekly meetings (Thursdays 11-12am Pacific Time)
    • 2. Receive emails from the email list

  • Interested in helping? Contact the WG chair, Kurt Bollacker

85

86 of 141

Inference

86

87 of 141

Inference Working Group

WG Purpose:

Develop an Inference performance benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios.

Goal:

  • Choose representative workloads for benchmarking and identify scenarios for realistic evaluation.
  • More submissions from wide range of industry participants

Join Inference WG https://groups.google.com/u/4/a/mlcommons.org/g/inference

Details on Inference benchmarks: https://github.com/mlcommons/inference

87

88 of 141

Updates from Last Quarter

  • V3.0 Inference submission in March-2023
    • Submission on March 3rd and Result publication on Apr 5
    • 25 submitters (Alibaba, ASUSTeK, Azure, cTuning, Deci, Dell, GIGABYTE, H3C, HPE, Inspur, Intel, Krai, Lenovo, Moffett, Nettrix, Neuchips, Neural Magic, NVIDIA, Qualcomm, Quanta Cloud Technology, rebellions, SiMa, Supermicro,VMware, xFusion)
    • > 6700 performance results ( 1.26x increase from last submission)
    • > 2400 power results
  • New Inference benchmark task-force
    • DLRM v2 task force
    • LLM task force
  • Open-source CK-MLPerf automation platform to provide a unified CLI and GUI to run MLPerf inference benchmarks on any hardware, visualize and reproduce results, and organize public optimization competitions

88

89 of 141

What’s Next

  • V3.1 submission timelines
    • New model freeze: Apr 28; Code freeze: June 2
    • Submission: Aug 4; Results publication: Aug 30
  • V3.1 new benchmarks
    • DLRM v2
    • LLM (175B, 6B)
  • Inference benchmarks in the pipeline for 2024
    • Stable Diffusion, GNN, Autonomous driving benchmark
  • Top 2-3 challenges
    • Discussion on the benchmark carrying capacity
    • What benchmarks to retire in order to add new benchmarks
  • What are your top community asks?
    • Anyone interested in being Inference WG co-chair, please contact David K or Inference chairs
    • Benchmark ownership
    • Active participation in WG and make submissions

89

90 of 141

Automation and Reproducibility Task Force�Collective Knowledge Playground

90

91 of 141

access.cKnowledge.org

A free, open-source, technology-agnostic and on-prem automation platform for collaborative and reproducible MLPerf inference benchmarking, optimization and comparison across any software, hardware, models and data sets from any vendor: https://github.com/mlcommons/ck/tree/master/platform

Simple GUI to analyze, compare and reproduce MLPerf v3.0, 2.1 and 2.0 results with any derived metric �such as Performance/Watt or Performance/$ : https://github.com/mlcommons/cm_inference_results

92 of 141

We thank Neural Magic (Michael Goin), Pablo Gonzalez Mesa, students (Himanshu Dutta, Aditya Kumar Shaw, Sachin Mudaliyar, Thomas Zhu) and other great contributors to help us validate the MLCommons CK technology (including CM aka CK2 - the new version of our portable workflow framework) to unify, automate and reproduce MLPerf inference submissions:

  • 80% of all results and 98% of power results
  • Diverse CPUs, GPUs and DSPs with PyTorch, ONNX, QAIC, TF/TFLite, TVM and TensorRT
  • Hardware from Nvidia (including 4090 workstation and Jetson AGX Orin edge device), Qualcomm, AMD, Intel and Apple
  • Deep Sparse optimization from Neural Magic and models from the Hugging Face Zoo
  • Cloud submissions on AWS and GCP
  • 1st end-to-end student submissions including on Apple Metal

cKnowledge.org/mlperf-inf-v3.0-forbes

cKnowledge.org/mlperf-inf-v3.0-report

Our 1st MLPerf inf v3.0 community submission

93 of 141

cKnowledge.org/challenges

Contact Grigori and Arjun (automation and reproducibility task force co-chairs) and/or join our Discord server to learn about how to participate in the upcoming 1st reproducible optimization tournament for MLPerf inference v3.1 and suggest your own challenges: discord.gg/JjWNWXKxwT

We will continue working with all MLCommons members and researchers to adapt MLCommons CK/CM to their needs, reduce their benchmarking and optimization costs, and improve MLPerf/MLCommons value:

    • Integrate their software and inference engines into portable CK-MLPerf workflows
    • Improve CK platform to automate their MLPerf experiments and optimization
    • Automatically generate containers for MLPerf benchmarks with CK/CM workflows and unified CLI

Based on your feedback, we plan to enhance the CK playground to generate Pareto-efficient end-to-end �AI and ML-based applications using MLPerf results, CK technology and modular CK/CM containers - prototype is available and will be integrated with the CK playground by Q3 2023!

93

Next: join the 1st public optimization tournament for MLPerf inference v3.1!

94 of 141

Training

94

95 of 141

Training Group

WG Purpose:

  • Define, develop and conduct MLPerf Training benchmarks

Goal:

  • Benchmark Training performance of key ML workloads on variety of platforms and thereby, enable HW and SW innovation which speeds-up ML

95

96 of 141

Updates from Last Quarter

  • Reference benchmark code complete for 2 new benchmarks:
    • LLM: GPT-3 175B model on C4 dataset (available in 2 frameworks - Pytorch/Megatron-LM, PAXML)
    • DLRM: DCNv2 with synthetically multi-hot Criteo dataset

  • MLCube integration underway for 2 benchmarks (BERT, Retinanet)

  • Registration survey for participation in Training v3.0 out!

  • 4 task force kick-offs to develop new benchmarks/methodologies:
    • Txt2Image: Stable Diffusion on LAION aesthetics dataset
    • GNN: R-GAT on IGB dataset
    • Automotive Training: PointPillars on Waymo Dataset
    • Power Methodology

96

97 of 141

What’s Next?

  • Training v3.0 submission deadline is May 19, 2023
  • Training v3.0 results publication on June 28, 2023

  • Training v3.1 will work towards landing:
    • Stable Diffusion benchmark
    • Power

  • Finalize benchmark roadmap for 2024:
    • GNN, Automotive are candidates for addition
    • Potentially drop some old benchmarks - discussion on-going

  • Continue reference clean-up & MLCube integration

  • To Join Training follow google group link: training@mlcommons.org

97

98 of 141

HPC

98

Chairs

  • Murali Emani, Argonne National Lab <memani@anl.gov>
  • Steven Farrell, Lawrence Berkeley National Lab <sfarrell@lbl.gov>OUTGOING
  • Andreas Prodromou, NVIDIA <aprodromou@nvidia.com> → NEW

Get involved

  • Join the HPC group: https://mlcommons.org/en/groups/training-hpc/
  • Meetings: Mondays, weekly alternating between 8-9AM PT and 3-4PM PT.
  • Reach out to the chairs

99 of 141

HPC WG overview

Purpose:

  • ML performance benchmarking on supercomputer systems
  • We publish the MLPerf HPC benchmark suite
    • SciML applications relevant for HPC systems
    • Modeled after MLPerf Training with few adjustments
    • Measure time-to-train and throughput (models/min)
  • We participate in BoFs, tutorial submissions, etc.

Goals:

  • Add more benchmarks, keep things fresh and relevant
  • Add more metrics relevant to HPC+science (e.g. power)
  • Increase interest and participation

99

Top500 supercomputers November 2022

100 of 141

Updates

  • Working towards v3.0 submissions due Oct and announce results at SC23 in Nov
  • HPC rules proposals to improve popularity and increase participation
  • Github link (https://github.com/mlcommons/training_policies/issues/513)
  • Overview of proposals
    1. Exclude data movement in timing measurement, focus on compute performance
    2. Allow throughput extrapolation to large system size
    3. Rename "weak scaling" to "throughput" and "strong scaling" to "Time To Train" (TTT)
  • Outcomes: arrived at group consensus to accept (a,c) and reject (b)
  • Adding new protein folding (OpenFold) benchmark
  • CosmoFlow pytorch reference implementation
  • Working with Power task force

100

101 of 141

Up next

  • Outreach
    • ISC BoF session
    • Reaching out to various HPC facilities and vendors to increase participation
  • Tutorial
    • Planning for a tutorial session to cover topics of DL optimization at scale on supercomputers using the HPC benchmarks
    • Presentations from facilities+vendors
    • Hands-on session on a leadership supercomputer such as Perlmutter (NERSC)
  • Adding Power measurement, targeting upcoming v3.0
  • Finalize OpenFold benchmark
  • MLPerf HPC v3.0
    • Benchmark freeze June 12
    • Submission deadline Oct 6

101

102 of 141

Storage Working Group

102

103 of 141

Purpose and Goals

WG Purpose:

  • Develop a benchmark suite that applies the same workload to a storage system as running MLPerf Training would, for different AI stacks and task types, so AI/ML teams can accurately size the storage required to support their overall AI/ML goals

SubGoals:

  • Simulate the load without “accelerator” hardware or real data
    • No GPUs required, no real-world data, no training in AI
  • Simulate the load from different task types
    • Unet-3D, DLRM, and NLP, for PyTorch and Tensorflow
    • Correlated w/Training: accelerator & storage perf together
  • Enable AI teams to compare storage vendors
  • Enable storage vendors to innovate & optimize for AI teams

103

PMLDB

DAWNBench

104 of 141

Beta Released, GA is Next!

Released two Betas and incorporated feedback:

  • This is all new so we’re using Beta releases to validate the benchmark and processes before we go GA
  • Beta 1 was released to the WG on February 6th
    • Accurately modelling reading of the dataset
  • Beta 2 was released to the WG and friendly partners April 7th
    • Added writing of periodic model checkpoints

General availability and a formal submission window opening:

  • We expect GA release of the benchmark in 3 to 4 weeks
    • The usual: open window, WG review, publish results
  • Expecting many vendors to participate at launch
    • Intel, NetApp, Samsung, Nutanix, Weka, Micron, NVIDIA
    • Still lining up more vendors

104

105 of 141

Short Term Next Steps

Accept and process submissions:

  • We have lots of work to smooth out the submission process
  • Lots of lessons will be learned from the submissions
    • Benchmark fixes and rules tightening

Add support for multi-host training to benchmark:

  • Crawl, walk, run – now crawling, will walk after 1st submissions
  • Next “feature” to be added is to simulate distributed training
    • For a single dataset, imposing a coordinated storage load across multiple hosts (distributing batches and using a single MPI barrier between iterations for weight exchange)

105

106 of 141

Long Term + Issues and Asks

Long Term:

  • O(solutions) == O(practitioners)
    • Need all Training workloads on PyTorch AND Tensorflow?

Issues and Asks:

  • Data cleaning is 50% of Watts consumed, impact on storage?
    • Need consensus on some form of analytical framework to represent cleaning, it’s too variable today to build a benchmark around it, yet it is at least half the workload and an entirely different access pattern

106

Data cleaning &

pre-processing

Training

107 of 141

Benchmark Infra

107

108 of 141

Benchmark Infra Group

WG Purpose:

  • Develop infrastructure and toolings to support benchmark submissions and facilitate compliance

Goals:

  • Develop and operate MLPerf submission service, and automate submission process
    • Now supporting Training, Inference, HPC, more in the work
  • Maintain benchmark compliance tools
    • logging lib, compliance checker, result summarizer, etc.

108

109 of 141

Updates

  • Submission service: completed initial migration to AppEngine
    • Moved to a new infrastructure
    • Improved scalability, availability and security
  • Inference 3.0 submission support
    • Fully supported with the new infrastructure
  • Training 3.0 logging, compliance, and submission support
    • Work in progress

109

110 of 141

What’s Next

  • Auth system for benchmark submission service
    • Update submitter authentication/authorization process
    • Helps improve security and submitter experience
  • Automate other parts of the submission process.
    • e.g. helper tools for review committee
  • DevOps improvements for availability and scalability
    • Internal tooling for deployment/management of submission infrastructure
    • Documentation, CI/CD, releasing process, etc.

Help us to help you

  • WGs: can we help with your submission needs?
  • Contribute to the benchmark infra work

110

111 of 141

Science

111

Working Group Chairs: Geoffrey Fox, Juri Papay, Jeyan Thiyagalingam

Co-founder Tony Hey steps down with WG thanks!

112 of 141

Science Working Group

WG Purpose:

  • Enhance AI for Science and Engineering research covering domains such as: energy, environmental, earthquake and earth sciences, material sciences, life sciences, fusion, particle physics and astronomy with training and inference applications

112

Goals:

  • Support scientific discovery using primary metric
  • Provide exemplars across a range of scientific domains
  • Encourage use of FAIR metadata and reproducible results
  • Enable educational use of our resources by students with rich documentation and experience records

113 of 141

Updates (past quarter)

  • Completed new GitHub and web resources for Open Division only with Rolling Submissions and Science Discovery as primary benchmark for four initial benchmarks
  • Designed blog like interface for informal submissions with variety of contributions
  • Several new benchmarks in two classes discussed
    • Science simulation Surrogates starting with Virtual Tissue Digital Twin and Computational Fluid Dynamics (OSMIBench)
    • Particle Physics (FastML) and collaboration with large NSF HDR projects
  • Paper at ISC published
  • Started three white �papers �(See next page)

113

CloudMask

Climate

Segmentation

RAL

CNN

STEMDL

Materials

Classification

ORNL

CNN

CANDLE-UNO

Medicine

Classification

ANL

MLP

TEvolOp Forecasting

Earthquake

Regression

Virginia

LSTM �Transformer

114 of 141

What’s next?

  • Complete submission Interface
  • Continue study of new benchmarks
  • Complete three white papers
    • AI Readiness of MLCommons Science (focus on FAIR issues and reproducibility)
    • Using Benchmarking Data to Inform Decisions Related to Machine Learning Resource Efficiency
    • Benchmark Carpentry for Science and Engineering
  • Documents open for new authors!

114

115 of 141

Research

115

116 of 141

MLCommons Research Overview

116

117 of 141

Updates

  • Develop benchmarks (e.g. Medical/Data/Tiny/Storage)
  • Disseminate knowledge (e.g., Training/Inference/Tiny/Mobile/People’s Speech/MSWC/Dollar Street/Storage)
  • Organize recurring workshops (e.g. MLBench)
  • Run tutorials at conferences (e.g. ASPLOS)
  • Creation of journals (e.g. DMLR)
  • Engage the community (e.g. SC Competition using MLPerf)
  • Give awards (e.g. Rising Stars)
  • Create internship opportunities
  • Raise external funding (e.g. NSF)

117

118 of 141

What’s Next? Exciting Goals for 2023.

  1. Help existing research groups succeed
  2. Kick off the rising stars program and make it successful
  3. Launch new research threads in place to foster new ideas

118

119 of 141

Rising Stars

  • It’s Official!�
  • Check out the website: https://mlcommons.org/en/rising-stars-2023/
  • Applications are coming in, deadline is tomorrow�
  • Domestic and international representation

119

120 of 141

Organizers

  • Udit Gupta (Incoming Assistant Professor at Cornell Tech)�
  • Abdulrahman Mahmoud (Postdoctoral Fellow at Harvard University)�
  • Lillian Pentecost (Assistant Professor of Computer Science at Amherst College)

120

121 of 141

Rising Stars: Objectives

Provide Support, Career Development, and Job Search Skills

for Emerging Researchers at the intersection of Machine Learning and Systems

Over the last ~6 years SysML/MLSys has grown into a vibrant research community

with strong academic and industry collaborations

Connect researchers across different career stages and institutions

Build community across MLSys

121

122 of 141

How to get involved?

  • Contact us if you want to support the rising stars program
    • Hosting
      • We need your support for hosting the event!
    • Funding
      • Sponsorship for either 2023 or 2024 rising stars program
    • Internships
      • Leverage the rising stars program as a source of future talent�
  • Have new ideas that are of interest that you think are worth exploring

  • Contact research@mlcommons.org

122

123 of 141

Lunch Break

Welcome back at

1:20 PM Pacific time

123

124 of 141

Schedule

124

9:00 AM

Breakfast

9:30 AM

VMware Welcome: Sujata Banerjee

9:45 AM

MLC Welcome: Peter Mattson

10:00 AM

MLC Update: David Kanter

10:30 AM

Break

10:50 AM

Working Group Update

12:20 PM

Lunch

1:20 PM

Power WG Showcase

1:35 PM

DataPerf WG Showcase

1:50 PM

Group discussions (in person only)

I Benchmark value to enterprise customers: getting involved / moderator: Debojyoti Dutta

II Datasets for model quality benchmarking - e.g. which is the best LLM? / moderator: Kurt Bollacker

III MLCommons research: how do we deliver value for researchers? / moderator: Vijay Janapa Reddi

3:20 PM

Cake Break

3:45 PM

Social hour

4:45 PM

End

125 of 141

Power showcase

125

126 of 141

Scaling of Machine Learning Models and Cost of Compute

  • Scaling of Compute (FLOPS) in models overwhelming Moore’s law
  • Energy scaling in technology nodes is (close to) stagnant
  • Power consumption of ML models is as important a metric as Performance

126

Source: Riselab, UC Berkeley

127 of 141

Power Working Group - Objective and Goals

  • MLPerf Power: A Best Practices Working Group
    • Objective (O) : Make Energy Efficiency of ML benchmarks a first class metric (like Perf)
    • How (KR) : Provide methodology to measure energy consumed of submissions across Benchmark categories

  • What :
    • Measured metric: System Power (and Energy consumed)
    • Submissions enabled for Inference (Datacenter, Edge) and TinyML
    • Key OKR for 2023 : Enable Power in MLPerf HPC and MLPerf Training in Oct submissions

  • Key Goals (Next several months) :
    • Improve the current measurement methodology for existing Inference Power (v3.1)
    • Enable adoption in distributed systems - HPC , Training
    • Increase/Encourage Inference submissions to Power
      • We have had 1.2K+ power related submissions in 2021 and over 4.6K+ in 2022
      • Yet, < 50% of total submissions

127

Demonstrate that we can move the needle of energy efficiency over time

128 of 141

Inference Power submissions

  • Submitters: SiMa, NeuChips, Dell, Fujitsu, H3C, Krai, cTuning, NVIDIA, Qualcomm, HPE, Inspur, Gigabyte, Lenovo and many others.

  • Latest Results (v3.0):
    • 2809 submissions on 42 unique systems
    • 13% more power results compared to v2.1

  • Overall, ~7000+ power submissions done in 100+ systems in 2 years

  • Good trajectory, but need more adoption

128

129 of 141

Power measurement for Distributed Systems

129

Task Force for MLPerf Power in HPC and Training meeting since February 28th

Objective : Deliver a measurement and/or estimation methodology to help evaluate energy efficiency of systems running MLPerf Training and MLPerf HPC benchmarks for October submissions

Link : MLPerf Power Measurement HPC/Training

Progress

Defined system scope that needs Power to be measured or estimated

    • Node measurement : Agreement on methodology
    • Interconnect estimation : Currently being worked on
    • Storage estimation : Agreement to drop from measurement
    • Cooling : To be discussed

Meets every Wed at 8:30AM . Please write to power@mlcommons.org to participate.

Date to lock methodology : June 30th 2023

130 of 141

MLPerf Power WG meetings - Call for Action

  • Attendees must join “power” alias

  • Call for all MLPerf community members to actively participate to expand to different verticals and take part in feature testing/development efforts

  • Currently MLPerf Power WG meets weekly
    • Tuesday’s 3PM Pacific Time
      • Moved to accommodate attendees from Asia

  • Please reach out if you have any questions

For additional information : https://mlcommons.org/en/groups/best-practices-power/

130

131 of 141

DataPerf showcase

131

www.dataperf.org

132 of 141

DataPerf Working Group

WG Purpose

  • Create a leaderboard for data and data-centric algorithms

Specific Goals

  • Build community

Create a canonical place to build data centric challenges

  • Build standards

Establish DataPerf as an independent standard entity to provide a badge of quality for datasets

132

D

133 of 141

Data is the new bottleneck

ML-Centric Paradigm

Data-Centric Paradigm

DataPerf: Paradigm shift

Source:

Kiela, Douwe, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen et al. "Dynabench: Rethinking benchmarking in NLP." arXiv preprint arXiv:2104.14337 (2021).

134 of 141

Data Quality Bottleneck

Data Quantity Bottleneck

What is Data Bottleneck?

  • Poor distribution
  • Bias
  • False information
  • Data stocks grow at a much slower pace than dataset sizes
  • Language data will be exhausted by 2030-2040
  • High-quality language data will be exhausted by 2026
  • Vision data will be exhausted by 2030-2060

Source:

Villalobos, Pablo, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, and Anson Ho. "Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning." arXiv preprint arXiv:2211.04325 (2022).

135 of 141

Recent Release: DataPerf v0.5

135

Data Selection

Data Cleaning

Data Creation (Adversarial)

Data Valuation

Data-Centric Tasks

136 of 141

Recent Release: DataPerf v0.5

136

Vision (Image classification)

Speech (Keyword identification)

NLP (Sentiment Analysis)

Domains

Multimodal (text-2-image)

137 of 141

Recent Release: DataPerf v0.5

137

Vision (Image classification)

Speech (Keyword identification)

NLP (Sentiment Analysis)

Multimodal (text-2-image)

Data Selection

Data Cleaning

Data Creation (Adversarial)

Data Valuation

138 of 141

DataPerf v0.5 Timeline

138

  • Open: March 30th
  • Close: May 26th
  • Winners announcement: July 28th at ICML

139 of 141

Community Engagement (21 days since launch)

139

Dynabench.org

DataPerf.org

#Submissions

  • Vision Selection: 3
  • Debugging: 4
  • Speech Selection: 0
  • Acquisition: 0

#Visits

  • Average: 40/day
  • Peak: 250/day

140 of 141

What’s next?

  • Research Goals
    • Extend current challenges for the next iterations
      • Fast iterations, quarterly
    • Diversify the portfolio of data-centric challenges
      • Slower, targeted, well-funded challenges
    • Publish results and announce winners at ICML

  • Community Expansion Goals
    • Outreach to startup/industry involvement

140

141 of 141

Call for Action

141

Join the Working Group and help us design and develop DataPerf

Participate in DataPerf v0.5 Competitions.

Join our discord channel to stay updated