MLCommons®
Community Meeting 1Q23
April 20, 2023
This community meeting is being recorded and will be shared
2
Schedule
3
9:00 AM | Breakfast |
9:30 AM | VMware Welcome: Sujata Banerjee |
9:45 AM | MLC Welcome: Peter Mattson |
10:00 AM | MLC Update: David Kanter |
10:30 AM | Break |
10:50 AM | Working Group Update |
12:20 PM | Lunch |
1:20 PM | Power WG Showcase |
1:35 PM | DataPerf WG Showcase |
1:50 PM | Group discussions (in person only) |
| I Benchmark value to enterprise customers: getting involved / moderator: Debojyoti Dutta II Datasets for model quality benchmarking - e.g. which is the best LLM? / moderator: Kurt Bollacker III MLCommons research: how do we deliver value for researchers? / moderator: Vijay Janapa Reddi |
3:20 PM | Cake Break |
3:45 PM | Social hour |
4:45 PM | End |
Welcome
4
5
What is the MLCommons Assocation?
6
ML/AI has huge potential to benefit everyone
MLCommons is building the ML ecosystem
7
Mission
AI / ML Ecosystem
Pillars
Benchmarks
Best practices
Research
Data
AI / ML Ecosystem
Wright brothers: public domain; Planes: Marek Ślusarczyk
Community
Working groups
8
9
Benchmarks
ML needs benchmarks for everything
10
ML component | Metrics | MLCommons WG |
Hardware | Speed/efficiency | MLPerf WGs |
Software (compiler + runtime) | Speed/efficiency | MLPerf WGs |
Model | Accuracy/efficiency | AlgoPerf WG |
Training algorithm | Accuracy/efficiency | AlgoPerf WG |
Data | Accuracy/efficiency | DataPerf WG |
Solution | Accuracy/safety | MedPerf WG Automotive task force … |
11
Data
Data is the new code.
Data defines best possible functionality.
The model is a lossy compiler.
12
Modern ML is built on public datasets
13
Public datasets are the language of ML research …
Even for the largest of ML focused companies…
But ML is evolving
14
How do we develop better datasets?
15
Community
Datasets
AGI train and test
Industry / tool R&D
Infrastructure
Public Good
Tools
Metrics
Funding
$ € ¥ …
Neurips
ICML?...
Venues + incentives
🏆
People + shared vision
!!!
16
Challenges
ML/AI is taking off
17
“AI” search interest over time
We are driving 200mph…while building the road
18
Photos: unsplash
Concretely
Rapid changes
ML-deployed-in-verticals
LLMs
Quality benchmarks
Datasets
Industrial use at academic pace
Org challenges
Member/community growth
Staffing/processes maturity
Membership model
19
20
Getting involved
We need more smart people!
21
22
Values
Values (https://mlcommons.org/en/philosophy/)
23
Photos: unsplash
MLCommons Update
24
MLCommons is Growing our Staff
Welcome aboard - excited for your contributions!
25
Q1 Accomplishments
26
MLPerf™ Inference v3.0 Results Overview
27
MLPerf Inference Trends
28
Press Coverage
“One of the best ways the AI/ML industry has today for measuring performance is with the MLPerf set of testing benchmarks, which have been developed by the multi-stakeholder MLCommons organization.”
Venture Beat
“This round featured even greater participation across the community with a record-breaking 25 submitting organizations, over 6,700 performance results, and more than 2,400 performance and power efficiency measurements.”
Yahoo! Finance
“Peter Rutten, VP infrastructure systems, IDC, said, “[MLPerf 3.0] is especially helpful because of the huge differences between all the systems in terms of performance and power consumption [and] the software that each system deploys to optimize the performance. Having the ability to compare all these systems in an objective way that is supported by most of the AI industry is allowing us to see how vendors compare”.
Enterprise AI
1Q23 MLCommons Hero Awards
30
Pablo Gonzalez Mesa:
Heroically landing MLPerf Inference despite many challenges and being awesome
Lilith Bat-Leah: Amazing volunteer spirit, building the DataPerf webpage, outreach, ICML workshop, and tireless organization
Oana Balmau:
Superb leadership, dedication, and enthusiasm for MLPerf Storage
Kelly Berschauer (Marketing)
ROLE: Director of Marketing�
BACKGROUND AND A LITTLE BIT ABOUT ME:
31
Nathan Wasson (IT)
ROLE: MLCommons Systems Administrator, Auditor, & Video Editor�
BACKGROUND AND A LITTLE BIT ABOUT ME:
32
David Tafur (Product Management)
ROLE: Product Manager
BACKGROUND AND A LITTLE BIT ABOUT ME: �
33
Sally Doherty (Board of Directors)
ROLE: Board Member & Finance Committee Chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
34
Weiming Zhao (Board of Directors)
ROLE: Director of Marketing�
BACKGROUND AND A LITTLE BIT ABOUT ME:
35
Kurt Bollacker (Datasets)
ROLE: Datasets WG Chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
36
Andreas Prodromou (HPC)
ROLE: HPC WG Co-chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
37
Juri Papay (Science)
ROLE: Science WG Co-chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
38
Ritika Borkar (Training)
ROLE: Training WG Co-chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
39
Max Bartolo (Dynabench)
ROLE: Dynabench WG Co-chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
40
Wei Zhao (Mobile)
ROLE: Mobile WG Co-chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
41
Mostafa El-Khamy (Mobile)
ROLE: Mobile WG Co-chair�
BACKGROUND AND A LITTLE BIT ABOUT ME:
42
43
It would not be possible without our members
Founding Members
Academics from educational institutions including:
Harvard University
Polytechnique Montreal
Peng Cheng Laboratory
Stanford University
University of California, Berkeley
University of Toronto
University of Tübingen
University of Virginia
University of York, United Kingdom
Yonsei University
York University, Canada
Members
Break
44
Schedule
45
9:00 AM | Breakfast |
9:30 AM | VMware Welcome: Sujata Banerjee |
9:45 AM | MLC Welcome: Peter Mattson |
10:00 AM | MLC Update: David Kanter |
10:30 AM | Break |
10:50 AM | Working Group Update |
12:20 PM | Lunch |
1:20 PM | Power WG Showcase |
1:35 PM | DataPerf WG Showcase |
1:50 PM | Group discussions (in person only) |
| I Benchmark value to enterprise customers: getting involved / moderator: Debojyoti Dutta II Datasets for model quality benchmarking - e.g. which is the best LLM? / moderator: Kurt Bollacker III MLCommons research: how do we deliver value for researchers? / moderator: Vijay Janapa Reddi |
3:20 PM | Cake Break |
3:45 PM | Social hour |
4:45 PM | End |
Working Group Updates
46
Working Group Updates
47
Mobile
48
EXAMPLE
Mobile Group
WG Purpose:
Goal:
49
EXAMPLE
Updates from Last Quarter (change title!)
50
EXAMPLE
What’s Next (1Q and 4Q)? (Change title!)
51
EXAMPLE
What’s ahead
52
Feb 2022 v2.0
New features and support expansion
Aug 2022 v2.1
Cross Platform Enablement
Mar 2023 v3.0
New features and Cross platform support
Aug 2023 v3.1
Increase adoption
EXAMPLE
Mobile
53
Mobile Group
WG Purpose:
Goal:
54
Updates since last Community Event
55
Upcoming Features
56
What’s ahead
57
Mar 2023 v3.0
New Features and Cross Platform Support
Aug 2023 v3.1
Working Towards Default Runtime and Data Collection
Aug 2022 v2.1
Cross Platform Enablement
Mar 2024 v4.0
Increase Benchmark Coverage and Adoption
Autonomous Driving Benchmark
58
Autonomous Driving Benchmark Group
Purpose:
Goal:
59
Updates
60
Automotive Benchmarking Task Force
61
Automotive Benchmarking Task Force
Background:
Purpose:
Goal:
62
Updates
63
Algorithms
64
Algorithms Working Group
WG Purpose:
Specific Goals:
65
Updates from Last Quarter
66
What’s Next?
Short Term:
Long Term:
67
Best Practices
68
Best Practices Working Group
Purpose:
Goal:
69
Updates from Last Quarter
70
What’s Next?
Promote MLCube
Support MLPerf benchmarks and MLCommons competitions:
New features:
71
Medical
72
https://mlcommons.org/en/groups/research-medical/
Medical Working Group
WG Purpose:
Goals:
73
Updates
74
What’s next?
75
Tiny
76
Tiny Overview
Tiny Working Group
What we are
77
Typical Systems�
�
Keyword Spotting | Visual Wake Words | Anomaly Detection | Image Classification |
DS-CNN | MobileNet v1 | FC Autoencoder | ResNet8 |
52 kPar | 325 kPar | 270 kPar | 96 kPar |
Current Benchmarks
Updates
Tiny Working Group
78
What’s Next
Tiny Working Group
(Will revert to normal time of 12:05 ET in June)
79
Datasets
80
Kurt Bollacker
2023 April 20
Datasets Working Group
WG Purpose:
Specific Goals:
81
Recent Release
Speech Wikimedia (March 2023)
Last Quarter
82
Challenges in
Dataset Creation
83
A new project: A Dataset Service for collaboration
Dataset Service: What will it look like?
84
First focus on the infrastructure to build a “Git for Data” service that supports:
Join The Datasets Working Group!
https://mlcommons.org/en/groups/datasets/
Google group link: Datasets Google Group
85
Inference
86
Inference Working Group
WG Purpose:
Develop an Inference performance benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios.
Goal:
Join Inference WG https://groups.google.com/u/4/a/mlcommons.org/g/inference
Details on Inference benchmarks: https://github.com/mlcommons/inference
87
Updates from Last Quarter
88
What’s Next
89
Automation and Reproducibility Task Force�Collective Knowledge Playground
90
access.cKnowledge.org
A free, open-source, technology-agnostic and on-prem automation platform for collaborative and reproducible MLPerf inference benchmarking, optimization and comparison across any software, hardware, models and data sets from any vendor: https://github.com/mlcommons/ck/tree/master/platform
Simple GUI to analyze, compare and reproduce MLPerf v3.0, 2.1 and 2.0 results with any derived metric �such as Performance/Watt or Performance/$ : https://github.com/mlcommons/cm_inference_results
We thank Neural Magic (Michael Goin), Pablo Gonzalez Mesa, students (Himanshu Dutta, Aditya Kumar Shaw, Sachin Mudaliyar, Thomas Zhu) and other great contributors to help us validate the MLCommons CK technology (including CM aka CK2 - the new version of our portable workflow framework) to unify, automate and reproduce MLPerf inference submissions:
Our 1st MLPerf inf v3.0 community submission
Contact Grigori and Arjun (automation and reproducibility task force co-chairs) and/or join our Discord server to learn about how to participate in the upcoming 1st reproducible optimization tournament for MLPerf inference v3.1 and suggest your own challenges: discord.gg/JjWNWXKxwT
We will continue working with all MLCommons members and researchers to adapt MLCommons CK/CM to their needs, reduce their benchmarking and optimization costs, and improve MLPerf/MLCommons value:
Based on your feedback, we plan to enhance the CK playground to generate Pareto-efficient end-to-end �AI and ML-based applications using MLPerf results, CK technology and modular CK/CM containers - prototype is available and will be integrated with the CK playground by Q3 2023!
93
Next: join the 1st public optimization tournament for MLPerf inference v3.1!
Training
94
Training Group
WG Purpose:
Goal:
95
Updates from Last Quarter
96
What’s Next?
97
HPC
98
Chairs
Get involved
HPC WG overview
Purpose:
Goals:
99
Top500 supercomputers November 2022
Updates
100
Up next
101
Storage Working Group
102
Purpose and Goals
WG Purpose:
SubGoals:
103
PMLDB
DAWNBench
Beta Released, GA is Next!
Released two Betas and incorporated feedback:
General availability and a formal submission window opening:
104
Short Term Next Steps
Accept and process submissions:
Add support for multi-host training to benchmark:
105
Long Term + Issues and Asks
Long Term:
Issues and Asks:
106
Data cleaning &
pre-processing
Training
Benchmark Infra
107
Benchmark Infra Group
WG Purpose:
Goals:
108
Updates
109
What’s Next
Help us to help you
110
Science
111
Working Group Chairs: Geoffrey Fox, Juri Papay, Jeyan Thiyagalingam
Co-founder Tony Hey steps down with WG thanks!
Science Working Group
WG Purpose:
112
Goals:
Updates (past quarter)
113
CloudMask | Climate | Segmentation | RAL | CNN |
STEMDL | Materials | Classification | ORNL | CNN |
CANDLE-UNO | Medicine | Classification | ANL | MLP |
TEvolOp Forecasting | Earthquake | Regression | Virginia | LSTM �Transformer |
What’s next?
114
Research
115
MLCommons Research Overview
116
Updates
117
What’s Next? Exciting Goals for 2023.
118
Rising Stars
119
Organizers
120
Rising Stars: Objectives
Provide Support, Career Development, and Job Search Skills
for Emerging Researchers at the intersection of Machine Learning and Systems
Over the last ~6 years SysML/MLSys has grown into a vibrant research community
with strong academic and industry collaborations
Connect researchers across different career stages and institutions
Build community across MLSys
121
How to get involved?
122
Lunch Break
Welcome back at
1:20 PM Pacific time
123
Schedule
124
9:00 AM | Breakfast |
9:30 AM | VMware Welcome: Sujata Banerjee |
9:45 AM | MLC Welcome: Peter Mattson |
10:00 AM | MLC Update: David Kanter |
10:30 AM | Break |
10:50 AM | Working Group Update |
12:20 PM | Lunch |
1:20 PM | Power WG Showcase |
1:35 PM | DataPerf WG Showcase |
1:50 PM | Group discussions (in person only) |
| I Benchmark value to enterprise customers: getting involved / moderator: Debojyoti Dutta II Datasets for model quality benchmarking - e.g. which is the best LLM? / moderator: Kurt Bollacker III MLCommons research: how do we deliver value for researchers? / moderator: Vijay Janapa Reddi |
3:20 PM | Cake Break |
3:45 PM | Social hour |
4:45 PM | End |
Power showcase
125
Scaling of Machine Learning Models and Cost of Compute
126
Source: Riselab, UC Berkeley
Power Working Group - Objective and Goals
127
Demonstrate that we can move the needle of energy efficiency over time
Inference Power submissions
128
Power measurement for Distributed Systems
129
Task Force for MLPerf Power in HPC and Training meeting since February 28th
Objective : Deliver a measurement and/or estimation methodology to help evaluate energy efficiency of systems running MLPerf Training and MLPerf HPC benchmarks for October submissions
Link : MLPerf Power Measurement HPC/Training
Progress
Defined system scope that needs Power to be measured or estimated
Meets every Wed at 8:30AM . Please write to power@mlcommons.org to participate.
Date to lock methodology : June 30th 2023
MLPerf Power WG meetings - Call for Action
For additional information : https://mlcommons.org/en/groups/best-practices-power/
130
DataPerf showcase
131
www.dataperf.org
DataPerf Working Group
WG Purpose
Specific Goals
Create a canonical place to build data centric challenges
Establish DataPerf as an independent standard entity to provide a badge of quality for datasets
132
D
Data is the new bottleneck
ML-Centric Paradigm
Data-Centric Paradigm
DataPerf: Paradigm shift
Source:
Kiela, Douwe, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen et al. "Dynabench: Rethinking benchmarking in NLP." arXiv preprint arXiv:2104.14337 (2021).
Data Quality Bottleneck
Data Quantity Bottleneck
What is Data Bottleneck?
Source:
Villalobos, Pablo, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, and Anson Ho. "Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning." arXiv preprint arXiv:2211.04325 (2022).
Recent Release: DataPerf v0.5
135
Data Selection
Data Cleaning
Data Creation (Adversarial)
Data Valuation
Data-Centric Tasks
Recent Release: DataPerf v0.5
136
Vision (Image classification)
Speech (Keyword identification)
NLP (Sentiment Analysis)
Domains
Multimodal (text-2-image)
Recent Release: DataPerf v0.5
137
Vision (Image classification)
Speech (Keyword identification)
NLP (Sentiment Analysis)
Multimodal (text-2-image)
Data Selection
Data Cleaning
Data Creation (Adversarial)
Data Valuation
DataPerf v0.5 Timeline
138
Community Engagement (21 days since launch)
139
Dynabench.org
DataPerf.org
#Submissions
#Visits
What’s next?
140
Call for Action
141
Join the Working Group and help us design and develop DataPerf
Participate in DataPerf v0.5 Competitions.
Join our discord channel to stay updated