1 of 188

State of AI Report

October 12, 2021

#stateofai

stateof.ai

Ian Hogarth

Nathan Benaich

2 of 188

About the authors

Nathan is the General Partner of Air Street Capital, a venture capital firm investing in AI-first technology and life science companies. He founded RAAIS and London.AI (AI community for industry and research), the RAAIS Foundation (funding open-source AI projects), and Spinout.fyi (improving university spinout creation). He studied biology at Williams College and earned a PhD from Cambridge in cancer research.

Nathan Benaich

Ian Hogarth

Ian is an angel investor in 100+ start-ups. He is a Visiting Professor at UCL working with Professor Mariana Mazzucato. Ian was co-founder and CEO of Songkick, the concert service. He studied engineering at Cambridge where his Masters project was a computer vision system to classify breast cancer biopsy images. He is the Chair of Phasecraft, a quantum software company.

stateof.ai 2021

#stateofai | 2

3 of 188

Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.

We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.

The State of AI Report is now in its fourth year. Consider this Report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.

We consider the following key dimensions in our report:

Research: Technology breakthroughs and their capabilities.
Talent: Supply, demand and concentration of talent working in the field.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI.
Predictions: What we believe will happen in the next 12 months and a 2020 performance review to keep us honest.

Collaboratively produced by Ian Hogarth (@soundboy) and Nathan Benaich (@nathanbenaich).

stateof.ai 2021

#stateofai | 3

4 of 188

stateof.ai 2021

Reviewers

Markus Anderljung, Ali Eslami,

Rob Ferguson, Yanping Huang,

Chip Huyen, Andrej Karpathy,

Allie Miller, Moritz Mueller-Freitag, Torsten Reil, Sebastian Ruder,

Shubho Sengupta, Jaime Teevan,

Nu (Claire) Wang, and Diane Wu.

Thank you!

Othmane Sebbouh

Research Assistant

Othmane is a PhD student in ML at ENS Paris, CREST-ENSAE and CNRS. He holds an MsC in management from ESSEC Business School and a Master in Applied Mathematics from ENSAE and Ecole Polytechnique.

#stateofai | 4

Contributors

5 of 188

Artificial intelligence (AI): A broad discipline with the goal of creating intelligent machines, as opposed to the natural intelligence that is demonstrated by humans and animals. It has become a somewhat catch all term that nonetheless captures the long term ambition of the field to build machines that emulate and then exceed the full range of human cognition.

Machine learning (ML): A subset of AI that often uses statistical techniques to give machines the ability to "learn" from data without being explicitly given the instructions for how to do so. This process is known as “training” a “model” using a learning “algorithm” that progressively improves model performance on a specific task.

Reinforcement learning (RL): An area of ML concerned with developing software agents that learn goal-oriented behavior by trial and error in an environment that provides rewards or penalties in response to the agent’s actions (called a “policy”) towards achieving that goal.

Deep learning (DL): An area of ML that attempts to mimic the activity in layers of neurons in the brain to learn how to recognise complex patterns in data. The “deep” in deep learning refers to the large number of layers of neurons in contemporary ML models that help to learn rich representations of data to achieve better performance gains.

Definitions

stateof.ai 2021

#stateofai | 5

6 of 188

Algorithm: An unambiguous specification of how to solve a particular problem.

Model: Once a ML algorithm has been trained on data, the output of the process is known as the model. This can then be used to make predictions.

Supervised learning: A model attempts to learn to transform one kind of data into another kind of data using labelled examples. This is the most common kind of ML algorithm today.

Unsupervised learning: A model attempts to learn a dataset's structure, often seeking to identify latent groupings in the data without any explicit labels. The output of unsupervised learning often makes for inputs to a supervised learning algorithm at a later point.

Transfer learning: An approach to modelling that uses knowledge gained in one problem to bootstrap a different or related problem, thereby reducing the need for significant additional training data and/or boosting performance.

Natural language processing (NLP): Enabling machines to analyse, understand and manipulate human language.

Computer vision: Enabling machines to analyse, understand and manipulate images and video.

Definitions

stateof.ai 2021

#stateofai | 6

7 of 188

Research

The Transformer architecture has expanded far beyond NLP and is emerging as a general purpose architecture for machine learning.
Large language models (LLM) are in the scale-out phase and have become “nationalised” where each country wants their own LLM.
AI-first approaches have taken structural biology by storm: proteins and RNA (cellular machinery) is being simulated with high fidelity.
JAX emerges as a popular ML framework as the pace of research productivity accelerates/researchers become first class citizens.

Talent

China universities have rocketed from publishing no AI research in 1980 to the largest volume of quality AI research today.
The de-democratisation of AI research continues as big tech companies collaborate with elite, but not lower tier, universities.
Academic groups struggle to compete on compute resources, while 88% of top AI faculty have received funding from big tech.

Industry

The AI and data company ecosystem has matured significantly with significant IPOs, signalling the entry into the deployment phase of AI.
Two major AI-first drug discovery and development companies complete IPOs with drugs in the clinic, further validating their potential.
AI-first products are deployed for high-stakes use cases: the UK’s National Grid (energy), employee health and safety, and warehouses.
The community brings a renewed focus on data issues that affect model performance in production (bias, drift, specification, labels, etc).
Semiconductor-related companies accelerate massively as nations seek supply chain sovereignty and NVIDIA’s Arm takeover is investigated.

Politics

AI is now literally an arms race: autonomous weapons have been deployed on the battlefield with more testing happening regularly.
AI safety is now top of mind, but fewer than 50 researchers are working in this domain full-time at the major AI labs.
New experiments on AI governance emerge: totally distributed + open source, private + open source, and public benefit corporation.
AI regulation begins in Europe.

Executive Summary

stateof.ai 2021

#stateofai | 7

8 of 188

Scorecard: Reviewing our predictions from 2020

stateof.ai 2021

#stateofai | 8

9 of 188

Our 2020 Prediction	Grade	Evidence
The first 10 trillion parameter dense model.	Yes	Microsoft demonstrated that it can train models with up to 32 trillion parameters. But it is unclear if these can learn better representations than existing large models.
Attention-based neural networks achieve state of the art result in computer vision.	Yes	Vision Transformers are #1 on ImageNet.
A major corporate AI lab shuts down as its parent company changes strategy.	Sort of	Alibaba’s AI lab fizzles out as part of an internal restructuring.
Chinese and European defense-focused AI startups collectively raise over $100M in the next 12 months.	No	Funding did not reach this level, yet.
One of the leading AI-first drug discovery startups either IPOs or is acquired for >$1B.	Yes	NASDAQ IPOs: Recursion on April 16, 2021 and Exscientia on October 1, 2021.
DeepMind makes a major breakthrough in structural biology and drug discovery beyond AlphaFold.	Yes	DeepMind released AlphaFold 2.
Facebook makes a major breakthrough in AR/VR with 3D computer vision.	No	Nothing major in 3D computer vision.
NVIDIA does not end up completing its acquisition of Arm.	Yes	The acquisition has not completed by its deadline and is under active investigation.

stateof.ai 2021

#stateofai | 9

10 of 188

Section 1: Research

stateof.ai 2021

#stateofai | 10

11 of 188

In our 2020 Report, we predicted: “Attention-based neural networks move from NLP to computer vision in

achieving state of the art results.”

2020 Prediction: Vision Transformers

stateof.ai 2021

Google proposed the ViT (Vision Transformer) model, a convolution-free transformer architecture.
ViTs benefit from scaling parameters (from pink to brown line in the plot) and pre-training data (dotted to solid). This helped ViT achieve 90.45% top-1 accuracy on ImageNet, which was the SOTA until CoAtNet, an architecture combining self-attention and convolutions, dethroned it (90.88%).
To adapt the input to the transformer architecture, the images are split into smaller square patches, flattened and linearly projected to have the transformer’s chosen input dimension. The resulting sequence is fed to a standard transformer.
Many more Transformers perform well on other CV tasks: e.g. Segmenter (Image Segmentation), Swin-Transformer (Object Detection).

#stateofai | 11

12 of 188

stateof.ai 2021

Facebook AI introduces SEER, a 1.3B parameter self-supervised model pre-trained on 1B Instagram images that achieves 84.2% top-1 accuracy on ImageNet, comfortably surpassing all existing self-supervised models.

Self-supervision is taking over computer vision

Self-supervision has driven NLP research to new heights. Extending this success to computer vision is hard because models need much more data to capture the semantics of a particular visual concept.
SEER combines SwAV, a method to learn image embeddings that yields consistent clustering of images with similar visual concepts, with RegNets, a scalable CNN architecture. It uses uncurated and unlabeled (non-EU) Instagram images.
SEER is a good few-shot learner: it still achieves 77.9% top-1 accuracy on ImageNet when trained with 10% of the dataset.
It also outperforms supervised methods on other tasks like object detection and segmentation.

#stateofai | 12

13 of 188

They also compare to other self-supervised methods and a supervised ViT trained on ImageNet, and show that a self-supervised ViT outperforms them on a video segmentation task.

Researchers compare a self-supervised ViT (SSViT) to fully supervised ViTs and convnets, and find that SSViTs learn more powerful representations.

What do self-supervised Vision Transformers see in an image that other models don’t?

stateof.ai 2021

By inspecting the self-attention module of the last block of SSViTs, the authors show that SSViTs learn “class-specific features leading to unsupervised object segmentations”.
The features learned by SSViTs are very powerful: They achieve 78.3% top-1 accuracy on ImageNet when using these features and a simple k-NN algorithm without fine-tuning or data augmentation.
They show that these properties don’t emerge for supervised ViTs and convnets.

#stateofai | 13

14 of 188

Self-attention is the basic building block of SOTA models on speech recognition...

Transformers take over other major AI applications, e.g. audio and 3D point clouds

stateof.ai 2021

The Conformer model combines self-attention and convolutions to capture both global interactions and local features.
Giant Conformers pre-trained using wav2vec 2.0 and self-training achieve the lowest word-error rates (WER) to date on Librispeech.

A team from Oxford, CUHK and Intel Labs designed self-attention networks for point clouds named Point Transformers.
Point Transformers significantly outperform prior work on diverse tasks such as object classification, object part segmentation, and semantic scene segmentation.
e.g. They achieve a record 70.4% mIoU on S3DIS Area 5 for scene segmentation, surpassing the previous best by 3.3 percentage points.

… and on 3D point cloud classification.

#stateofai | 14

15 of 188

DeepMind’s Perceiver is one such architecture. It solves the Transformers’ quadratic dependence on the input

length by computing attention between the input and a low-dimensional learnable vector, rather than between

the input and itself.

Transformers extend into efficient self-attention-based architectures

stateof.ai 2021

Another important benefit of Perceiver is its general purpose. It doesn’t use domain-specific assumptions and can handle arbitrary input types: images, videos, point clouds, etc.
Perceiver performs on par with other application-specific architectures, e.g. ViTs for image classification.
Perceiver IO is an improvement of Perceiver which handles both arbitrary inputs and outputs of any size. This extends Perceiver’s capabilities to NLP, games, video generation, etc.
On NLP tasks, Perceiver IO doesn’t require prior tokenization and directly operates on bytes instead. It still matches the performance of the Transformer-based BERT on GLUE.

#stateofai | 15

16 of 188

stateof.ai 2021

Researchers from UC Berkeley, Facebook AI and Google show that you don’t need to fine-tune the core parameters of a language pre-trained Transformer in order to obtain very strong performance on a different task.

More evidence for the general purpose nature of Transformers

They use a GPT-2 and only fine-tune input and output layers, and layer norms (<0.1% of all parameters).

#stateofai | 16

17 of 188

stateof.ai 2021

While pre-trained transformers have taken the ML world by storm, new research shows that convolutional neural networks (CNNs) and multi-layered perceptrons (MLPs) shouldn’t be an afterthought. When trained properly, they are competitive with transformers on several NLP and computer vision tasks.

Beyond transformers: MLPs and CNNs make a comeback

Google researchers set out to disentangle the effects of pre-training and architectural advancements on the performance of language models. They found that pre-training helps CNNs as much as it helps transformers. On 7 out of 8 tasks they consider, they showed that a pre-trained convolutional Seq2Seq outperforms T5, a recent SOTA transformer. However, transformers still have the edge in modeling long-range dependencies.
Other Google researchers proposed MLP-Mixer, an all-MLP architecture for computer vision. Using MLPs for computer vision goes against the conventional wisdom (using CNNs) and recent breakthroughs (Vision Transformers). They show that MLP-Mixer scales well to large datasets and is competitive with SOTA CNNs and ViTs.

#stateofai | 17

18 of 188

stateof.ai 2021

Neural Radiance Fields (NeRF) already achieves SOTA results on view synthesis. New applications further highlight how impressive it is.

Remarkable progress in Novel View Synthesis

Given multiple views of an image, NeRF uses a multilayered perceptron to learn a representation of the image and to render new views of it. It learns a mapping from every pixel location and view direction to the color and density at that location.
NeRF outperforms previous work on datasets of both synthetic and real images. It has also found a powerful application in disentangled image generation — the task of controlling one or more attributes of an image, for example translating or rotating objects without changing the background.
GIRAFFE uses a generative variant of NeRF to represent objects in images without the need for supervision through camera poses. But instead of modeling the entire scene with an MLP, GIRAFFE does this for each object.

Rotating the blue object.

Adding more objects.

360º car rotation.

#stateofai | 18

19 of 188

In our 2020 Report we predicted: “DeepMind makes a major breakthrough in structural biology and drug discovery beyond AlphaFold.”

2020 Prediction: AlphaFold 2

stateof.ai 2021

DeepMind returned to CASP14 (2020) with a new system, AlphaFold 2 (AF2), two years winning CASP13 (2018) with AF1.
AF1 used convolutional layers to predict a distance map between pairs of amino acids in order to generate a 3D structure.
AF2 uses a spatial graph representation of amino acids. Residues are the nodes and edges connect the residues in close proximity.
Next, an attention-based model is trained end-to-end to interpret the structure of this graph along with evolutionarily related sequences, multiple sequence alignment (MSA), and amino acid residue pair representation to iteratively refine this graph from which 3D protein structure coordinates are generated.
The AlphaFold DB plans to deliver a >2,000-fold increase in the number of structures for known protein sequences and a >700-fold increase in total number of structures by the end of 2021.

#stateofai | 19

20 of 188

Half a year after DeepMind presented their AlphaFold 2 (AF2) method at the CASP14 conference, the Baker lab at the University of Washington created their own protein structure prediction system using related ideas and managed to attain accuracies approaching the original AF2 without detailed access to its methodology.

The ideas behind AlphaFold 2 rapidly diffused into academia and open source

stateof.ai 2021

In the Baker model, information is processed back and forth from the 1D amino acid sequence information, the 2D distance map, and the 3D coordinates, such that the network must reason over relationships within and between sequences, distances, and coordinates.
Necessity is the mother of invention: “DeepMind reported using several GPUs for days to make individual predictions, whereas our predictions are made in a single pass through the network in the same manner that would be used for a server.”
Notably, the model can generate structure models for protein-protein complexes from sequence information, which reflects the reality of how proteins function in the body.

#stateofai | 20

21 of 188

Proteins found in nature today are the product of evolution. But what if AI could generate artificial proteins with useful functionality beyond what evolution has designed?

Large language models can generate functional proteins that are unseen in nature

stateof.ai 2021

This work learns a protein language model by predicting the next amino acid for over 280M protein sequences from thousands of protein families (top figure).
AI-generated proteins across 5 families of antibacterial lysozymes show similar biological performance characteristics as their natural peers, even when their sequence similarity is only 44% (bottom figures).
The 3D structure of the model-generated artificial lysozyme was then determined by X-ray crystallography showing conserved fold and position of enzyme active site residues compared to the natural protein.

#stateofai | 21

22 of 188

stateof.ai 2021

Language models trained on viral sequences can predict mutations that preserve infectivity but induce high antigenic change, akin to preserving “grammaticality” but inducing high “semantic change”.

Learning the language of Covid-19 to predict its evolution and escape mutants

Viral escape occurs when a virus mutates to evade neutralizing antibodies from the host immune system. This can impede the development and effectiveness of vaccines, which we’ve seen with the Delta variant.
Language model evolutionary features help identify the S494P mutation, which decreases the neutralization potential of multiple therapeutic antibodies against SARS-CoV-2 pseudovirus in vitro.
Going forward, we could imagine vaccine development that corners viral evolution by using language models to better understand how it generates sequence diversity.

#stateofai | 22

23 of 188

Single-stranded RNAs (e.g mRNAs) fold into well-defined 3D structures to effect their biological function. Unlike proteins, we know little about RNA folding and the number of available RNA structures is 1% of that for proteins.

New state-of-the-art for predicting the 3D structure of RNA molecules

stateof.ai 2021

A new method called Atomic Rotationally Equivariant Scorer (ARES) processes the 3D coordinates and chemical element type of each atom of an RNA molecule and predicts the root mean square deviation (RMSD) from the unknown true structure.
ARES is trained on 18 RNA molecules with experimentally determined structures and 1,000 structural models of these RNAs sampled with Rosetta’s FARFAR2. ARES is optimised such that its output is as close to the RMSD of the models as possible.
Notably. ARES isn’t given any prior information about what RNA molecules are, nor does it use sequences of related RNAs.
In the RNA-Puzzles challenge, ARES selects the best Rosetta FARFAR2 model for each of four RNA molecules, beating humans and other methods, despite significant differences with its training set.

#stateofai | 23

24 of 188

Cryogenic electron microscopy (cryo-EM) empirically determines the structure of macromolecules at near atomic-resolution without the need for their crystallisation. Cryo-EM involves shooting electron beams at a flash-frozen sample of protein or molecule of interest. The microscope generates images of these molecules that are then combined to reconstruct its 3D structure. All stages of the cryo-EM workflow are amenable to AI, ranging from specimen preparation and data collection to structure determination and atomic interpretation.

Cryo-EM and AI: the next frontier in structural biology and drug discovery

stateof.ai 2021

For structure-guided drug discovery, we need protein structures at ~2.5 Å resolution or better (i.e. near-atomic resolution).
Cryo-EM enables structure determination of dynamic protein complexes.
Cryo-EM structures at ~ 2 Å resolution were first reported by Sriram Subramaniam in 2015-2016, and the field has grown rapidly with >200 high-resolution structures projected for 2021.
Combining AI-driven computational predictions of structure (e.g. AlphaFold) with cryo-EM experiments will be key to unravel protein-protein interactions, which mediate biological function.

#stateofai | 24

25 of 188

stateof.ai 2021

Combination therapy could improve cancer patient outcomes, but empirically testing a large number of them is unfeasible in the lab setting. Here, self-supervision is used to observe cells treated with a finite number of drug combinations and to predict the effect of unseen combinations.

Predicting and prioritising novel drug combinations, dosages, and timing for therapy

An autoencoder is used to encode and learn embeddings for the transcriptional response of single cells to 30 drug treatments across different cell types, doses, and drug combinations.
The model learns three additive embeddings: the cell’s basal state, the observed perturbation, and the observed covariates.
At evaluation time, we can swap out the model’s perturbation embedding to answer the counterfactual question “What would have the gene expression of this cell looked like, had it been treated differently?”

#stateofai | 25

26 of 188

stateof.ai 2021

Deep learning models can learn drug-protein binding relationships from a small number of empirical experiments in order to help prioritise which areas of vast chemical spaces to virtually screen.

Accelerating high-throughput virtual drug screening with model-guided search

Structure-based drug discovery searches for drugs that bind a protein of interest whose 3D structure is available. This process, referred to as “docking”, can be run virtually using simulations. However, with databases of small molecule chemicals exploding past billions of records, virtually screening all combinations becomes computationally and commercially intractable.
A solution is to train a model on a sample of drug-protein interactions with empirically determined docking scores.
This model can be used to virtually score a library of interest, followed by docking the top scoring drug candidates. These results are used to update the model with active learning. With several iterations, model-guided search ultimately generates hits faster.

#stateofai | 26

27 of 188

stateof.ai 2021

The yield of a chemical reaction describes the percentage of reactants that are transformed into the desired product and is a key metric for reaction performance. Predicting reaction yields helps chemists to navigate chemical reaction space and design more sustainable, economical and effective synthesis plans.

Predicting chemical reaction performance using Transformers

Reaction transformer encoder models fine-tuned on augmented reaction SMILES, outperform all previous approaches in predicting Buchwald-Hartwig reaction yields - an essential tool in the pharmaceutical industry - even in the low data regime.

#stateofai | 27

28 of 188

stateof.ai 2021

MuZero is the latest member of DeepMind’s “Zero” family. It matches AlphaZero’s performance on Go, chess and Shogi, and outperforms all existing models on the Atari benchmark while learning solely within a world model. Muzero appeared in Nature in December 2020.

Games continue to drive Reinforcement Learning research

DeepMind’s previous successful algorithms relied on being given the precise game dynamics, which they used for planning. For very complex and unstructured games, this approach doesn’t scale well.
MuZero learns exclusively within a world model, meaning it learns a model of the game’s dynamics.
But learning a complete model of these dynamics is a hard task. MuZero instead only models what is relevant to its decision making, enabling it to scale well to complex games.
The Atari benchmark is a suite of visually complex games which had been beyond the reach of model-based systems. MuZero now outperforms the best model-free systems on Atari, while performing as well as state of the art algorithms on Go, chess, and Shogi.

#stateofai | 28

29 of 188

stateof.ai 2021

DreamerV2 is the first model-based RL agent trained on a single GPU to surpass human level performance on 55 popular tasks of the Atari benchmark. The agent learns behaviors purely within the latent space of a world model trained from pixels, which makes these behaviors more generalisable to solving future tasks more efficiently.

Superhuman world models for Atari, but on a budget

DreamerV2 vastly outperforms other RL agents trained with the same computational budget, across all performance aggregation metrics.

#stateofai | 29

30 of 188

stateof.ai 2021

RL agents have shown impressive performance on challenging individual tasks. But can they generalize to tasks they never trained on? DeepMind trained RL agents on 3.4M tasks across a diverse set of 700k games in a 3D simulated environment, and show they can generalize to radically different games without additional training.

Zero-shot generalisation in reinforcement learning

The researchers created XLand, a vast controllable environment, which allows them to dynamically adapt both how the agents train and, crucially, the games on which they train.
The distribution of games is learned using a hyperparameter optimization technique called Population Based Training. It allows them to find the games which have the right level of difficulty given the agents’ behaviour. This ensures the agents build evermore general capabilities.
As training progresses, the agents exhibit heuristic behaviours such as experimenting, changing the state of the world, and cooperation, which are uncharacteristic of usual RL agents. These learned behaviours allow them to generalize to hand-designed held-out tasks, a first in RL research.

Figure: Examples of XLand environments.

Figure: Test metrics progress during training.

#stateofai | 30

31 of 188

stateof.ai 2021

Soon after AlphaGo was published in 2016, a software implementation called Leela was made available. To assess its impact on the performance of Go players, researchers studied 750K Go moves from 1,200+ players between 2015 and 2019. They show that the advent of Leela coincided with a significant improvement in move quality.

Trained by AI: AlphaGo coaches professional Go players

All professional players

Old vs. Young

China vs. Japan vs. Korea

The improvement was higher among young players, who might be more open to learn from Leela.
Players in China and Korea, who were the most aware of Leela (as measured by the number of web searches), had a higher improvement in move quality than players from Japan, who belatedly adopted Leela.

#stateofai | 31

32 of 188

stateof.ai 2021

The increasing complexity of RL benchmarks and the computational power required to solve them have led researchers to evaluate their models using fewer and fewer runs. Yet, most still report only point estimates, like median scores. The result is a very noisy picture of the performance rankings of SOTA RL models.

Researchers call for more rigorous use of statistics in Reinforcement Learning

Researchers examined the performance evaluations of 6 of the best RL algorithms on the Atari 100k benchmark. They showed that these often rely on unconventional evaluation protocols or on unreliable stochastic point estimates that widely overestimate/underestimate their expected value due to the low number of runs.
They propose to use either confidence intervals or robust point estimates. One example is the interquartile mean (IQM). It is robust to outliers, which makes it well-suited to the handful-of-runs regime.
Using IQM and other metrics, they reclassify SOTA RL algorithms on 3 popular RL benchmarks. They urge the researchers to use more metrics in order to paint a complete picture of the performance of their models.

#stateofai | 32

33 of 188

stateof.ai 2021

To solve video-and-language (V&L) tasks like video captioning, ClipBERT only uses a few sparsely sampled short clips. It still outperforms existing methods that exploit full-length videos.

Less is more: watching a few clips is enough to learn how to caption a video

The usual approach to solve video-and-language tasks is to use separate task-agnostic encoders for videos and images, then use the resulting features to teach a neural network the task at hand.
A natural improvement of this process would be end-to-end learning of vision and text encoders. But due to the length of the video clips, this is usually computationally unaffordable.
Surprisingly, researchers show that with end-to-end learning, one only needs a few samples of a video to outperform existing methods which use full-length videos. They also verify that ClipBERT performs better with sparse random sampling than with dense uniform sampling.
ClipBERT surpasses SOTA methods on datasets for text-to-video retrieval and video QA, including MSRVTT, DiDeMo and TGIF-QA.

#stateofai | 33

34 of 188

The underlying assumption in Multilingual ASR (Automatic Speech Recognition) is that the additional information learned from one language should benefit other languages. In practice, using more languages makes the modeling task more difficult due to large language variations and heavy data imbalance.
While low-resource languages do benefit from multilingual training, high resource languages (like English) usually suffer from the reduction in model capacity compared to the monolingual setting.

stateof.ai 2021

Google researchers tackle the high-resource language degradation problem by increasing model capacity.

For large-scale multilingual speech recognition too, the bigger the better

They consider a massive 15-language dataset of 7K to 54K hours per language. By increasing their model’s capacity from 1B to 10B parameters and making it deeper, they improve the performance (measured by Word Error Rates (WER)) of their multilingual system on all languages compared to monolingual models. They also show that increasing model capacity actually increases training speed.

#stateofai | 34

35 of 188

stateof.ai 2021

Speech generation usually requires training an Automatic Speech Recognition (ASR) system, which is resource-intensive and error-prone. Researchers introduce Generative Spoken Language Modeling (GSLM), the task of learning speech representations directly from raw audio without any labels or text.

Beyond ASR for speech generation: textless NLP

A major goal of GSLM is to make AI more inclusive: The majority of textual information available online is in a few languages like English. Better use of the audio information available online (podcasts, local radios, social apps) could help improve current AI audio systems’ performance on rarer languages.
Through intonation, audio encodes more emotions and nuances. Being able to generate speech only from audio signals in a self-supervised fashion could result in more natural and expressive AI systems.
The researchers have already made some first steps in GSLM, by showing that they can leverage prosody (rhythm, stress and intonation of speech) to generate natural and coherent speech.

Some examples here: https://speechbot.github.io/pgslm/

#stateofai | 35

36 of 188

stateof.ai 2021

Diffusion models’ training is more stable than GAN’s and outperforms them on several well-established datasets in image generation, audio synthesis, shape generation and music generation.

GANs have a serious new adversary: diffusion models

Principle: Given an image from a dataset D, after enough steps of random noise additions, we approximately end up with a sample from the distribution of the noise. What if it was possible to revert the process and recover an image from the distribution of the dataset D by sampling from noise?
Method: Diffusion models solve this problem by modeling the inverse distribution (generating denoised images from noisy ones) at each step as a Gaussian whose mean and covariance are parametrized as a DNN.
Diffusion models are not new, but recent improvements have made them theoretically and practically appealing.
Although they are slower, they beat GANs on ImageNet across all resolutions from 64x64 to 512x512.

#stateofai | 36

37 of 188

The canonical approach to applying deep computer vision to medical images is fine-tuning ImageNet pre-trained models or using rule-based label extraction from medical textual reports. In contrast, the ConVIRT method pre-trains directly on naturally occurring image-text pairs using a contrastive objective, without any supervision. ConVIRT outperforms all ImageNet-initialized models with only 10% as much labeled training data.

stateof.ai 2021

Learning medical image representations from text-image pairings

During contrastive pre-training, the model learns to associate each image in a batch with its text companion, while dissociating it from the other text snippets. To have better learned representations, ConVIRT makes the task harder by using random transformations of the images and texts.
ConVIRT was tested on 4 datasets spanning 4 different classification tasks: binary, multi-label binary, multi-class and anomaly detection. In 3 out of the 4 tasks, ConVIRT with only 1% training data achieved better classification results than ImageNet initialized models which used 100% training data.

#stateofai | 37

38 of 188

stateof.ai 2021

OpenAI’s CLIP uses 400M text-image pairs to learn image and text representations. It exhibits a solid performance across a wide variety of datasets without any fine-tuning.

Multimodal self-supervision plus scale equals a powerful representer

CLIP’s powerful learned representations result from using 3 ingredients: a Vision Transformer, a contrastive objective (inspired by ConvIRT), and... scale.
During contrastive pre-training, the model learns to associate each image in a batch with its text companion, while dissociating it from the other text snippets.
To use CLIP on a specific classification task, one needs to use prompts, where the labels of the task’s dataset are reformulated to resemble the pre-training set while communicating the underlying context of the task. CLIP then predicts, among all the encoded prompts, the one which has minimal contrastive loss with the encoded image.
CLIP is a good zero-shot learner. It performs as well as the original fully supervised ResNet-50, and, on average, it outperforms all existing models in zero-shot prediction across 27 datasets on object classification, OCR, activity recognition in videos, and geo-localization.

#stateofai | 38

39 of 188

stateof.ai 2021

OpenAI’s DALL-E treats text-image pairs as a generative task and thus learns to generate believable images for a wide array of natural language prompts.

DALL-E draws what you want, but be sure to instruct it well

DALL-E is a 12B parameters version of GPT-3 trained on text-image pairs. It receives encoded images and texts in the form of a sequence of 1280 tokens, which it models autoregressively.
To produce the best samples from the text prompts, the researchers use CLIP to rerank the 32 best generated images of DALL-E, which consistently yields impressive visualizations.
A natural question that arises from this and related research is the question of forming an effective prompt. Indeed, the exact framing of the text prompt has a large effect on the quality of the results.

#stateofai | 39

40 of 188

stateof.ai 2021

CLIP is already serving as a base model for downstream tasks: Researchers from Google use its zero-shot capabilities together with Mask R-CNN to create a zero-shot learning model (VLiD) that surpasses supervised models on zero-shot object detection.

Using CLIP’s learned representations for zero-shot object detection

During training, VLiD is only given a part of the classes to predict, for which CLIP generates class representations. Then, VLiD uses CLIP to predict the class of the image representations generated by Mask R-CNN.
Only during inference is VLiD given the novel classes that were unseen during training.
VLiD is the first zero-shot object detector to be evaluated on the LVIS dataset, and it outperformed its supervised counterpart on the novel categories.

#stateofai | 40

41 of 188

After breaking down a problem into manageable smaller problems, a developer can call Codex to map these problems to existing code (libraries, APIs, or functions) automatically.
OpenAI Codex understands the context of instructions and can retain a memory of prior instructions to reason more efficiently over new queries.
The system is trained using GPT-3’s natural language datasets in addition to billions of lines of source code retrieved from public sources including GitHub.

stateof.ai 2021

OpenAI’s Codex system is a specialised offspring of GPT-3 that is focused on translating natural language into functional computer code in a dozen programming languages.

Codex for coders

User Instructions

Code generated by Codex

Outputs generated by Codex

#stateofai | 41

42 of 188

Code generation models can generate snippets of code, but they struggle to generate entire programs.

Yet, code generation models still cannot crack the coding interview

stateof.ai 2021

In coding challenges, participants are required to write programs that solve problems which are described in natural language.
APPS, a benchmark of 10,000 coding questions, tests how well code generation models can solve these challenges. Generated code is then tested using human-written test cases.
GPT-2 and GPT-Neo, two general-purpose language models, are fine-tuned on Github and APPS training data, while OpenAI’s Codex, which is trained on code, is used without further fine-tuning.
Codex vastly outperforms the other language models on APPS problems, but all models achieve low scores, especially on intermediate and hard level problems (well below 5% accuracy).

#stateofai | 42

43 of 188

Researchers from Berkeley introduce MATH, a dataset of math competition problems formulated in natural language. This is a departure from previous datasets based on formal theorem provers.
They test two models, GPT-2 (0.1B to 1.5B parameters) and GPT-3 (2.7B to 175B) and show for both models that the increase in size resulted in better scores. However, the scores were mediocre, ranging between 3% and 7%.
It should be noted that the dataset is quite challenging, since a computer science PhD student “not particularly interested in math” achieved “only” a 40% score on the dataset.

Models do poorly on competition mathematics problems that test for reasoning and problem solving ability.

And don’t expect language models to help you with your math tests either

stateof.ai 2021

#stateofai | 43

44 of 188

Researchers tested large language models on TruthfulQA, a new benchmark of questions spanning domains such as health, law, conspiracies and fiction. They showed that the best model was truthful on 58% of the questions, compared to the human baseline of 94%. More surprisingly, models of larger sizes were generally less truthful.

Big fat liars: large language models are less truthful than their smaller peers

stateof.ai 2021

Figure 2: Average truthfulness on control trivia questions

While LLMs were relatively truthful on control trivia questions, they struggled on TruthfulQA, which contains questions which were designed to fool the largest GPT-3.

Figure 1 : Average truthfulness on Truthful QA

#stateofai | 44

45 of 188

stateof.ai 2021

Researchers at CMU surveyed more than 60 papers to make sense of the ongoing progress in prompting research in NLP. They thoroughly document the shift from the “pre-train, fine-tune” procedure to the “pre-train, prompt and predict” one, which is especially relevant for zero-shot learning.

Pre-train, prompt, predict: a new paradigm for NLP models

To use a pre-trained language model (LM) on a new task, the dominant method was to fine-tune it by adapting the objective of the LM via a textual prompt.
In prompting, we do the inverse: We adapt the new tasks to LMs. For example: given a model pre-trained on a multilingual dataset, “if we choose the prompt “English: I missed the bus today. French: ___”), an LM may be able to fill in the blank with a French translation” without specifically training the model on a translation task.
The price tag for this model flexibility is prompt engineering: how to choose the best prompt for the task at hand?

#stateofai | 45

46 of 188

stateof.ai 2021

Prompting has been shown to be one of the critical parts of zero/few-shot learning in NLP. As zero shot methods become more ubiquitous, effective problem framing through prompts becomes more relevant.

Prompting is key to zero-shot learning

By effectively communicating the problem context in the form of a “prompt” and using target labels to fill slots in a “Mad Libs” style augmented target, model accuracy can be dramatically improved both quantitatively (left) and qualitatively (right).

From the ML at Berkeley blog: “Unreal Engine is a popular 3D video game engine created by Epic Games. CLIP likely saw lots of images from video games that were tagged with the caption “rendered in Unreal Engine”. So by adding this to our prompt, we’re effectively incentivizing the model to replicate the look of those Unreal Engine images.”

#stateofai | 46

47 of 188

stateof.ai 2021

Choosing a bad prompt can result in massive performance degradations in NLP tasks. Users can avoid this choice altogether via prompt learning, where prompts are formulated as learnable vectors.

But prompting is also challenging and brittle

For each example in LAMA, a fact retrieval benchmark, researchers from NYU and Facebook generated ~12 prompts of different quality. They showed that standard selection methods generally failed to find the best prompt. Worse: 45% of the time, prompt selection methods resulted in worse prompts than with random selection. Surprisingly, the accuracy losses were larger for Larger LMs.
One way to avoid prompt selection is to use continuous trainable prompts. P-tuning, a method which relies on such prompts, outperforms SOTA approaches on LAMA and on the few-shot SuperGlue benchmark. Unfortunately, these prompts are not interpretable, and it is impossible to use them for zero-shot learning.

#stateofai | 47

48 of 188

stateof.ai 2021

3 different teams from Baidu, Google and Microsoft all surpass human baselines on the SuperGLUE NLP tasks.

One year after General Language Understanding Evaluation (GLUE), SuperGLUE is solved

Baidu’s ERNIE 3.0 is the best scoring model (90.6%), outperforming the human baseline by 0.8 percentage point.
ERNIE 3.0 stands out from two perspectives: its pre-training data and its historical development.
Data: In addition to a massive text corpus, ERNIE 3.0 uses a large-scale knowledge graph of 50 million facts to enhance the model’s world knowledge.
Origins: ERNIE has been developed fully within Chinese institutions (Tsinghua, Huawei, Baidu). While these have long been seen as followers, they are now leading the NLP SOTA race.

#stateofai | 48

49 of 188

stateof.ai 2021

M6 is a 100B parameter model pre-trained on the largest dataset in Chinese for NLP and multimodal tasks.

CLIP, but now in Chinese

While GPT-3-based models have demonstrated impressive performance on several multimodal tasks like image generation from text, they are trained primarily on English text.
Researchers from Tsinghua and Alibaba introduce a dataset of 1.9TB of images and 290MB of Chinese text, on which they pre-train a large transformer.

#stateofai | 49

50 of 188

stateof.ai 2021

After the success of the (English pre-trained) GPT-3, large language models in multiple languages are emerging from private and public companies, academic research labs, and independent open-source initiatives.

The “democratization” of large language models

The model and dataset sizes differ and largely depend on the available resources to developers.
The largest Chinese Language model, Wudao, which is also the largest language model in any language, was developed by the Beijing Academy of Artificial Intelligence and has 1.75T parameters (i.e. 10x GPT-3).
The Korean company Naver announced it has trained a 204B parameters-model called HyperCLOVA trained on Korean text.
Another effort is that of Aleph Alpha, a German AI startup, which announced in August 2021 that it had developed a large European language model, fluent in English, German, French, Spanish, and Italian, although they haven’t disclosed all the details of their model.
Contrary to the other organizations, EleutherAI, a collective of independent AI researchers, open-sourced their 6B parameter GPT-j model. More on this in the Politics section.

#stateofai | 50

51 of 188

stateof.ai 2021

Researchers show that human evaluators are often in disagreement on Natural Language Generation (NLG) tasks. This calls into question the idea of beating current human baselines as the gold standard for NLP tasks.

New study suggests human evaluation should be re-evaluated

780 evaluators were asked to determine whether text passages were written by humans or state-of-the-art generative models: GPT-2 and GPT-3. They correctly distinguished 57.9% of the time for GPT-2, but only 49.9% of the time for GPT-3, pointing to the improvement in NLG models.
A way to improve their performance was to train human experts to better identify GPT-3-authored text, but this improved the accuracy to only 55%.
What is striking, however, is the justifications of their classification: human evaluators often gave contradicting explanations on the same examples, “sometimes using the same aspect of the text to come to opposite conclusions.”
Most evaluators systematically underestimated current NLG models, and focused on form rather than content in their evaluation. The researchers call the community to think better about how to collect human evaluation of NLG models.

#stateofai | 51

52 of 188

Large proportions of the US population have been excluded from medical AI training data sets in radiology, ophthalmology, dermatology, pathology, gastroenterology, and cardiology.
AI models often perform poorly on populations that are not represented in the training data. It is critical for AI training data to mirror the populations for which model are ultimately serving.
Underrepresented populations might also have specific problems that remain unaddressed.

stateof.ai 2021

56 studies were published between 2015-19 that reported the training of a deep learning algorithm on at least one geographically identifiable patient cohort to perform an image-based diagnostic task vs. a human physician across 6 clinical disciplines. Of these studies, 71% used a patient cohort from one of three states: California, Massachusetts or New York. Thirty four states did not contribute data, point to huge patient underrepresentation.

Data deserts in biomedical AI research are likely to result in model bias in the clinic

#stateofai | 52

53 of 188

Demographic factors (e.g. age, sex, ethnicity) can influence patient outcomes based on their association with long-standing healthcare and societal inequities or, although less common, can change the efficacy of drugs.
An analysis of gene expression read-outs from disease relevant tissue samples across 3,000 studies comprising 177,201 individual samples found that many missed information on age (48%), sex (40%) and ethnicity (71%).
There was a significant lack of non-European samples from older donors, as well as varying sex distributions across different ethnicities.

stateof.ai 2021

Missing information and biases in demographic information are widespread in biomedical data that form the basis of the drug discovery process. ML solutions trained on these data need to understand and adapt for these biases to avoid perpetuating health inequities.

Measuring bias: a first step towards more inclusive health research outcomes

There is a significant lack of African American / Afro-Caribbean samples in older ages

Most samples are derived from Europeans

Samples from different ethnicities display different sex disparities

#stateofai | 53

54 of 188

Of three retrospective studies that pitted an AI system against clinical decisions made by a human radiologist, all 36 AI system evaluated by these studies were less accurate than the consensus of two or more radiologists.
The study concludes that “AI systems are not sufficiently specific to replace radiologist double reading in screening programs.”
It is unclear where AI might be most benefit on the clinical pathway for breast cancer.

stateof.ai 2021

The UK National Screening Committee commissioned an investigation of the accuracy of AI systems for detecting breast cancer during routine screening. It found that studies published in the last ten years were of poor methodological quality and none were prospective studies that measured the accuracy in screening practice.

Beware of overstated claims: 94% of AI systems for breast cancer screening are less accurate than the original radiologist

#stateofai | 54

55 of 188

A multi-site study using public and private chest X-ray, chest CT, digital radiography, breast mammogram, and spine X-ray image data were used to built AI systems for race detection.
Trained models displayed >0.8 and often >0.9 ROC-AUC scores on the task of race prediction across imaging modalities, suggesting very high performance on this task.
Worryingly, this detection is not due to trivial proxies, such as body habitus, age, or other potential imaging confounders.
Learned features appear to involve all regions of the image and frequency spectrum, which complicates mitigation efforts.

stateof.ai 2021

There is a conundrum in medical imaging AI: While computer vision models trained on a patient’s medical imaging data of various modalities can accurately and trivially predict their race, clinicians attempting to do the same cannot. This implies that medical AI systems can potentially cause discriminatory harm and reproduce or exacerbate the racial disparities that already exist in medical practice.

Medical AI racism: models reliably identify the self-reported racial identity of patients

#stateofai | 55

56 of 188

stateof.ai 2021

Last year’s Report drew attention to the lack of openness of AI research as measured by the percentage of arXiv papers that share the code required to reproduce their results. Methodology improvements from the Papers With Code project that make the openness metric more ML specific have resulted in an increase from 15% in last year’s Report to 26% today. However, when analysing the authors of the “hottest papers” in the last 30 days*, we find that only 17% shared a code repository. This might suggest that some authors do not prioritise its timely release.

26% of AI research papers make their code available and 60% make use of PyTorch

*Top socially shared papers on Twitter for 30 days until 8 September 2021

#stateofai | 56

57 of 188

The surveyed practitioners apply AI on landslide detection, suicide prevention, and other high-stakes domains.
92% reported experiencing one or more data cascades, and 45.3% reported experiencing two in the same project.
The researchers attribute this to multiple factors including (a) lack of recognition of the data work in AI, (b) lack of adequate training, (c) difficulty of access to specialized data for the studied region/population. The authors call for developing metrics to assess goodness-of-data, better incentives for data excellence, better data education, better practices for early detection of data cascades, and better data access in the Global South.

stateof.ai 2021

Google researchers define Data cascades as “compounding events causing negative, downstream effects from data issues”. Supported by a survey of 53 practitioners from the US, India, East and West African countries, they warn that current practices undervalue data quality and result in data cascades.

Data becomes more critical when the stakes are high

#stateofai | 57

58 of 188

stateof.ai 2021

As Large Language Models (LLMs) become ever-more successful and ubiquitous, better documentation of the large training text corpora becomes critical. Researchers dissected C4, a 305 GB dataset that Google obtained by filtering a snapshot of Common Crawl. They found that the filtering disproportionately removed text about minority individuals.

Large language training datasets need better documentation

Among the most frequent identity mentions, those of “sexual orientations (lesbian, gay, heterosexual, homosexual, bisexual) had the highest likelihood of being filtered out”. Moreover, African American English and Hispanic-aligned English were disproportionately removed from the text due to the blocklist filter.
Interestingly, the dataset contains machine-generated translations. With the proliferation of machine-generated text online, many practitioners fear that new LLMs will inherit the flaws of older ones, further perpetuating their biases.
The researchers recommend a documentation methodology where the excluded data is explicitly described. They put this to practice and host a documented version of the C4 corpus, which had not been made easily available before.

Figure: Proposed documentation methodology.

#stateofai | 58

59 of 188

Legal NLP: Several works have shown that pretraining on existing legal datasets didn’t help more than pretraining on general texts when NLP is applied to legal texts.
Stanford’s RegLab introduces a huge dataset of ~3.5M legal decisions (37GB of text) across American federal courts, on which they pretrain their language model, legalBERT.

stateof.ai 2021

As data-hungry deep learning conquers more applications, better domain-specific datasets are needed. Legal NLP and malware exemplify this struggle as new pretraining datasets and benchmarks come to the rescue.

Better datasets for machine learning in production: legal documents and malware

legalBERT significantly outperforms a general purpose BERT on 3 tasks, including CaseHOLD, a new task consisting of 53,000 Q&As from American Case Law.
Malware: SophosAI and ReversingLabs introduced SoReL-20M, the largest dataset for malware detection. It contains 20 million files with significantly more metadata than older datasets. They find that 20 million files is a large enough size to differentiate between machine learning models of different capacities.
They also released models trained on this dataset that can serve as baselines.

#stateofai | 59

60 of 188

Working with massive datasets is cumbersome and expensive. Carefully selecting examples mitigates the pain of big data by focusing resources on the most valuable examples, but classical methods often become intractable at-scale. Recent approaches address these computational costs, enabling data selection on modern datasets.

Careful data selection saves time and money by mitigating the pains of big data

stateof.ai 2021

Data selection methods improve the efficiency of AI/ML by identifying the most valuable points to label (active learning) or train on (core-set selection).
Web-scale active learning on billions of examples is now possible with SEALS, which reduces the computational cost of data selection algorithms by 10-1000x.
50% of CIFAR10 can be removed without impacting accuracy using SVP, leading to a 1.6x speed-up in end-to-end training.

#stateofai | 60

61 of 188

To be accepted to a conference, an author’s paper must receive positive reviews from a small panel of expert reviewers who are selected from the ML community.
Reviewers receive papers on the basis of their expertise that are blind to who the authors are. Reviewers must also declare conflicts of interest.
However, collusion rings have emerged that threaten the legitimacy of the reviewing process, the quality standard of the conference, and the trustworthiness of accepted papers.
In a collusion ring, reviewers agree amongst each other to provide glowing reviews of each other's work. They share the names of their papers between themselves and request to review these papers.

stateof.ai 2021

With the explosion of papers submitted for consideration at major ML conference venues each year and the limited spots available, the ML community is calling attention to illicit collusion rings amongst reviewers.

Can you trust the quality of papers you read at academic conferences?

Credits: Michael Littman and Sergei Ivanov

#stateofai | 61

62 of 188

stateof.ai 2021

Industry affiliated authors are less likely to provide access to their research code upon initial submission for conference review compared to academic affiliated authors. While industry authors enjoy a higher paper acceptance rate (right figure), academic authors release their code more frequently than industry authors, whether initially or once a paper is camera-ready (left figure).

Providing code alongside a research paper submission isn’t mandatory, but growing

#stateofai | 62

63 of 188

A study in the Annals of Thoracic Surgery took 112 original articles and shared half of them via Twitter.
When compared with non-shared articles, the tweeted articles accumulated 9x higher Altmetric scores that measure article mentions, news articles, and social media shares.
One year later, tweeted papers accumulated 3x more citations than non-tweeted papers, which suggests that research communications has become an important strategy in capturing attention around new research.

stateof.ai 2021

Academic papers are disseminated via peer-reviewed journals and academic conferences. Today, researchers are creating Twitter threads and highly designed blog posts that resemble startup product launches to share and hype up their work.

The rise of research communications: Twitter threads drive citations 3-fold higher

#stateofai | 63

64 of 188

stateof.ai 2021

GNN is the 4th most used keyword at ICLR’21 and the one with the largest increase in usage from 2019 to 2020.

Graph Neural Networks: From niche to one of the hottest fields of AI research

Credits: Xavier Bresson

#stateofai | 64

65 of 188

stateof.ai 2021

Modeling physical systems dynamics often requires subdividing complex continuous spaces into simpler discrete cells, a process called mesh-generation. DeepMind researchers used GNNs to accelerate mesh-based simulations by 1 to 2 orders of magnitude compared to classical solvers.

Graph Neural Networks applications: mesh-based simulation

Researchers used GNNs to learn the mesh dynamics and adapt the resolution to the required accuracy in different regions of the simulation domain.
They showed that their method is faster than particle and grid-based baselines, and can generalize to more complex dynamics than those it trained on. They attributed part of the increased computational efficiency to the fact that GNNs benefit from hardware acceleration.

Mesh-based simulations aim to predict how meshes will change over time depending on external factors. For example: how cloth moves under the action of the wind. Meshes can naturally be expressed as graphs, where adjacent cells are connected and each cell has a number of nodes and edges determined by the mesh choice.

#stateofai | 65

66 of 188

stateof.ai 2021

Accurately predicting the estimated time of arrival (ETA) for a given route requires a complex understanding of the spatiotemporal interactions taking place on the road. GNNs are well suited for this task because roads and their intersections naturally form a graph network. A GNN-based system reduced negative ETA outcomes between 16% and 51% around the world in live production.

Graph Neural Networks applications: improving ETA predictions in Google Maps

First, roads are chunked into connected segments that follow typical traffic routes and form longer supersegments.
The world is divided into regions that have similar driving behaviors and train region-specific GNNs.
Data represents the actual traversal times across segments and supersegments, which are used as node-level and graph-level labels for prediction, respectively.
For a given starting time, the GNN learns the travel time of each supersegment at specific points in the future.

#stateofai | 66

67 of 188

stateof.ai 2021

While very expressive and powerful, GNN model size doesn’t scale well alongside dataset size due to the complexity of modelling millions of nodes and billions of connections. This is problematic for real-world problems when deploying large GNNs for equally large graph datasets without sacrificing model parameters.

Graph Neural Networks: improving the memory and parameter efficiency of large models

To overcome the memory bottleneck of large GNNs, we either need new hardware or model architectures that consume less memory.
A method called deep reversible architectures (RevGNN) offers memory consumption that is independent of the number of layers in a model. RevGNN has a very large capacity at low memory cost and only slightly increased training time compared to baseline GNNs (ResGNN). Their deepest model, RevGNN-Wide, is the deepest GNN to date with 1000 layers.
With only a fraction of the memory footprint, RevGNNs outperform some baselines on a node prediction benchmark task. But depth still doesn’t help in most tasks, which is worthy of future investigation.

Figure: RevGNNs outperform existing models with significantly less memory consumption.

#stateofai | 67

68 of 188

stateof.ai 2021

By using a graph as the world model, the L3P agent is able to efficiently plan even over long time horizons.

Graphs for model-based reinforcement learning

Planning with model-based RL requires breaking the task at hand over multiple small steps, and planning at each one.
This is challenging over long time horizons: (a) the longer the horizon, the longer modeling errors accumulate, and (b) planning at each state quickly becomes intractable.

L3P learns over a sparser set of steps. To do this, L3P clusters intermediate goals that are easily reachable from one another, thereby learning a small number of important landmarks. Landmarks are modeled as nodes, and the edges are weighted by a reachability distance between the landmarks.
Finally, L3P uses graph search to compute the shortest path to the goal.

Figure: Compared to existing methods, L3P has a smoother trajectory and doesn’t get stuck in search for the next goal.

#stateofai | 68

69 of 188

stateof.ai 2021

Chinese industrial and academic labs win all 3 tasks of the Open Graph Benchmark Large Scale Challenge.

Chinese institutions also sweep a major Graph Neural Networks competition

The challenge is organized by the Open Graph Benchmark team, which gathers leading researchers from American and German Universities and companies.
The challenge is particularly important because it introduces datasets of unprecedentedly large scale spanning prediction on 3 different levels: links, nodes, and graphs.
The winners included the usual suspects (Baidu, Tencent, Ant, Peking University), other Chinese Universities and Microsoft Asia.
Additionally, on the 15 tasks of the Open Graph Benchmark, another set of smaller-scale datasets, submissions from Chinese institutions ranked first on 11 tasks, and first or second on 14 tasks.

#stateofai | 69

70 of 188

stateof.ai 2021

Predicting rainfall at high-resolution with a short lead time (<2h, i.e. “nowcasting”) is important for businesses and people when making weather-dependent decisions. New deep generative model (DGM)-based methods bring added resolution and prediction accuracy beyond that of physics-based simulations and current ML methods.

Deep generative models offer highly accurate probabilistic predictions of precipitation

A DGM is trained on historically observed radar-based estimates of precipitation. The DGM learns a probability distribution of this data from which it can generate future radar predictions.
The model represents uncertainty across multiple spatial and temporal scales, which makes it amenable to predicting smaller-scale weather phenomena that are particularly stochastic.
This work evaluated the DGM’s performance against fifty meteorologists from the UK’s Met Office and preferred it to other deep learning methods (PySTEPS and Axial Attention) based on accuracy and the usefulness across 88% of evaluation cases.

#stateofai | 70

71 of 188

stateof.ai 2021

In this work, a digital biomarker is developed for idiopathic pulmonary fibrosis in mice. Diseased and healthy animals are treated with a drug and their behavior is continuously tracked and analysed using computer vision. Behavioral patterns are learned across animal studies and functionalized as digital biomarkers that relate to drug efficacy and adverse reactions as a study progresses. An example digital biomarker is breathing rate, which can map more directly to patient symptoms in a clinical study. This compares to traditional endpoints (e.g. lung histology) that can only be measured after the study.

Computer vision unlocks accurate and fast disease assessment using digital biomarkers for drug discovery

Continuous data capture (including video)

Digital Biomarkers

(including breathing rate)

Healthy

Disease

Disease + Treatment

Healthy + Treatment

Breathing Rate (AUC)

Digital Biomarkers detect disease and show drug efficacy without waiting for histology

#stateofai | 71

72 of 188

COVID vaccines are shown to be highly effective from large-scale observational data collected with the ZOE COVID Study App and the use of causal methods.

Citizen science with 1.2M participants demonstrates real-world vaccine effectiveness

stateof.ai 2021

Estimating treatment effects (i.e. vaccines) from observational studies requires the use of causal models to account for confounding effects in the data.
Despite being highly effective (circa 80% protection) at the outset, protection of vaccines wanes over time.

age

sex

healthcare worker

comorbidities

background infections

vaccinated

infected

Infection risk reduction

Months since vaccine

Infection risk reduction since end of May (Delta emergence)

AstraZeneca

Pfizer

Causal variables to account for

#stateofai | 72

73 of 188

stateof.ai 2021

Major factors that drive the carbon emissions during model training are the choice of neural network (esp. dense or sparse), the geographic location of a datacenter, and the processors. Optimising these reduces emissions.

Reducing the carbon emissions of large neural network training by 100-1000x

Companies with heavy AI workloads including NVIDIA and AWS estimate that 90% of the energy consumption comes from inference and 10% from training.
Google evaluated the energy and CO₂ budget of five popular large language models and proposes simple formulas for researchers to measure and report on these costs when publishing their work.

#stateofai | 73

74 of 188

stateof.ai 2021

Introduced by Google in late 2019, JAX is a python package that combines Autograd (a library for automatic differentiation) and XLA (a compiler for linear algebra) to accelerate computations for machine learning research.

Here comes a new framework challenger: JAX

A convenient feature of JAX is its resemblance to numpy, a popular package for scientific computations, which makes it easier to adopt. Other features include easy vectorization, parallelization and just-in-time compilation.
The JAX ecosystem is rapidly growing, with libraries for neural networks (Flax, Haiku), optimization (Optax), reinforcement learning (Rlax), federated learning (FedJAX), amongst others.
Of the 14,500 models available on Hugging Face, 4,900 already have a JAX implementation, compared to 11,500 for Pytorch and 1,200 for Tensorflow.
Given Google’s weight in machine learning research and their investment in JAX, the framework is certainly here to stay.
While it is not used in production yet, we can expect the research to production gap to be closed eventually, as was the case for Pytorch.

#stateofai | 74

75 of 188

Section 2: Talent

stateof.ai 2021

#stateofai | 75

76 of 188

stateof.ai 2021

Brazil and India are hiring >3x more AI talent today than they were in 2017, matching or surpassing the hiring growth of Canada and the US. Meanwhile, almost 30% of scientific research papers from India include women authors compared to an average of 15% in the US and UK, and far greater than 4% in China.

India and China see significant growth of AI talent, and India’s AI research is most diverse

#stateofai | 76

77 of 188

stateof.ai 2021

The Chinese Academy of Sciences, the national academy for the natural sciences in China, was founded in 1949. From having no AI publications in 1980, the institution went to the #1 institution publishing top 25% quality^* AI research 30 years later. Tsinghua University and Peking University emulated its growth and are now competitive with the oldest and best universities in the world: Oxford, Cambridge, Harvard, Stanford et al.

白手起家: A Chinese institution publishes the largest volume of quality AI research today

*Microsoft Academic Graph measures quality by “using a dynamic eigencentrality measure that ranks a publication highly if that publication impacts highly ranked publications, is authored by highly ranked scholars from reputable institutions, or is published in a highly regarded venue and also considers the competitiveness of the field.”

#stateofai | 77

78 of 188

stateof.ai 2021

China is projected to reach nearly double the number of STEM PhD students in the US by 2025.

China is outpacing the US in STEM PhD growth...

Between 2003 and 2007, when China surpassed the number of US STEM PhD graduates, more than 1,300 new PhD programs were built from scratch.
Between 2012 and 2021, the Chinese government doubled its investment in higher education, resulting in an increase of 40% in the number of Chinese PhD graduates.
The projected numbers in the right plot are based on current enrollment patterns and are all but certain to be realised.

#stateofai | 78

79 of 188

stateof.ai 2021

High count numbers can artificially hide a decrease in program quality, for example if driven by a rapid development of mediocre programs. Data shows that this is not the case in China, where 43% of PhD graduates in 2019 came from Double First Class Universities^*, a slight drop from 46% in 2015.

…without sacrificing program quality

“In 2020, 36 of the 42 “Double First Class” universities were ranked in the top 500 universities globally, and 21 were ranked in the top 200.”
Most of the recent growth in the number of PhDs comes from elite universities administered by the Ministry of Education: these universities accounted for 65% of the total increase in PhD enrollments between 2015 and 2019.
The “Yao Class” at Tsinghua University is another example of elite undergraduate education that’s set up to feed into Tier 1 US postgraduate schools.

*Double First Class Universities are “a tertiary education development initiative designed by the People's Republic of China government, in 2015, which aims to comprehensively develop elite Chinese universities and their individual faculty departments into world-class institutions by the end of 2050.” - Wikipedia

#stateofai | 79

80 of 188

stateof.ai 2021

An analysis of data from 2015-2019 examines the curricula of students from Tsinghua and Peking University. 70% of undergraduates continue to undertake postgraduate studies. Only 16% of all graduates (Bachelors, Masters, PhDs) choose to study abroad after graduation: their preferred destination is the US, followed by the UK. Domestically, Huawei holds firmly as their top employer.

Where do students from leading Chinese universities go?

Credits: Jeffrey Ding

2015-2019 Ranking of Employment Units for Tsinghua and Peking University Graduates
	Tsinghua University
Rank	2015	2016	2017	2018	2019
1	State Grid	Huawei	Huawei	Huawei	Huawei
2	Huawei	State Grid	State Grid	Tencent	Tencent
	Peking University
1	Huawei	Huawei	Huawei	Huawei	Peking U
2	Baidu	ICBC	ICBC	Tencent	Huawei

Figure: Destinations of Tsinghua University’s 2019 graduates who go abroad for further study.

#stateofai | 80

81 of 188

stateof.ai 2021

Since 2012, large technology companies have increasingly published either on their own or in collaboration primarily with elite universities as opposed to mid-tier and lower-tier universities. Counterfactual analysis suggests a causal divergence between large technology companies and non-elite universities that is driven by access to computing power as a form of de-democratisation. This results in a small set of actors creating a majority of the high-impact research output.

Elites work with elites: a compute divide drives the “de-democratization” of AI research

#stateofai | 81

82 of 188

stateof.ai 2021

Researchers in deep learning with higher average impact papers from elite universities are more likely to transition into technology companies than their non-elite peers (middle chart). Early in their industry tenure, the citations of researchers increases and then steadily declines over the years (right chart). This suggests a depletion of academic impact (left chart).

Academia to industry transitioning is increasingly popular amongst top universities

#stateofai | 82

83 of 188

stateof.ai 2021

In last year’s Report, we noted the significant efflux of Professors from North American universities into large technology companies (top 3 magnets were: Google/DeepMind, Amazon, Microsoft) from 2004-18. In 2019, the trend largely continued with 33 faculty members departing (right graph). It is notable that 85% of Professors that are hired are Tenured, meaning their level of seniority is such that they hold permanent employment at the university. CMU, Georgia Tech, Washington, and Berkeley lost the most faculty between 2004-19 (left graph).

The Great Academic Brain Drain...continued

#stateofai | 83

84 of 188

stateof.ai 2021

In Germany, for example, student enrollment in applied sciences is growing >60% YoY (2018) whereas faculty growth in the same department and year is stagnating around 25% YoY. In absolute numbers, 2018 saw circa 230k students vs 2.5k professors, suggesting that 1 professor advises 90-100 students. This is untenable.

Depletion of academic faculty and the worsening of faculty:student ratios

Total university students

Students at universities of applied sciences

Professors at universities of applied sciences

Total university professors

Annual growth rate of students and professors (%)

#stateofai | 84

85 of 188

stateof.ai 2021

In the Netherlands, for example, student enrollment in STEM programs has grown 68% between 2000 and 2017 but government funding for these resource-intensive programs has dropped 25% in the same time period on a per student basis. Academics fear for the livelihood of their programs. This is in stark contrast to China, where the government introduced AI courses for elementary and secondary school students in 2018 and has expanded its investment into STEM ever since.

Government funding cuts to higher education threatens more expensive STEM students

STEM student enrollment in the Netherlands

Government funding per student in the Netherlands

#stateofai | 85

86 of 188

stateof.ai 2021

A Google researcher polled Twitter on the approximate annual compute budget for academic and industry AI labs. The responses suggested that grant bodies often reject the inclusion of compute budgets in grant applications and that most research groups work with very small numbers of GPUs. In response, some large cloud vendors are moving in to fill the gap.

Research groups struggle to compete given institutionally limited budgets

#stateofai | 86

87 of 188

stateof.ai 2021

Unsurprisingly, therefore, Big Tech companies are a major source of academic research funding. This lets them indirectly craft a desirable public image and influence events, decisions, and research agendas of the universities they fund (particularly top tier institutions).

More money, more influence: 88% of top AI faculty have received funding from Big Tech

NeurIPS 2020 Platinum Sponsors

63% Big Tech // 21% Finance

% of CS faculty members who have at any point received funding or employment from Big Tech

#stateofai | 87

88 of 188

stateof.ai 2021

Carnegie Mellon University partnered with Emerald Cloud Labs, to build the “world’s first cloud lab in an academic setting” as part of the university’s $250M investment into new science facilities. The project, costing $40M, will house 100 different scientific instruments for life science experiments on the CMU campus that are orchestrated via the cloud and executed by automated workflows. Another related academic-tech company relationship is the $240M partnership between IBM and MIT that formed the MIT-IBM Watson AI Lab in 2017.

Universities team up with private companies to fill research resources gap

#stateofai | 88

89 of 188

stateof.ai 2021

The gender and racial diversity data radically differ between technical and non-technical teams. They show a massive lack of gender diversity in technical teams, while a better balance is achieved in product and commercial teams. African Americans and Hispanics constitute a lower share of the AI workforce than their share in the general workforce, with the severest drop coming from technical teams. These teams also have the highest share of Asian workers.

The US AI workforce: gender and racial diversity

“Technical teams” are defined by CSET as all professionals that can immediately work on AI products or possess the skills to do so with limited additional training (research scientists, software developers, data architects, etc.).
The “non-technical” AI workforce comprises product teams and commercial teams.

#stateofai | 89

90 of 188

stateof.ai 2021

Computer research scientists, software developers, mathematicians, statisticians and data scientists saw an evolution of their employment that is far ahead of the general employed population. To meet the increasing demand for technical talent, computer science and engineering were the fastest growing undergraduate degrees over 2015 to 2018, accounting for 10.2% of all 4-year degrees conferred in 2018. Their numbers increased by 34% and 25% respectively during the period, while the number of other awarded degrees increased 4.5% on average.

Market forces in action: supply of technical US AI talent grows 26.5% to meet demand

	2019 Employment	2015-2019 Employment Change
Computer Research Scientists	35,230	72.9%
Mathematicians/ Statisticians/ Data scientists	184,290	251.9%
Software Developers	1,651,990	38.9%
Total Technical AI	1,871,510	48%
Total US Employed	160,034,580	5.8%

	Number of conferred degrees
	2018	2015-2018 change
Computer Science	79,598	34%
Engineering	121,956	25%
Mathematics/ Statistics	25,256	15.6%
Total technical AI degrees	226,810	26.5%
All degrees	1,980,644	4.5%

#stateofai | 90

91 of 188

stateof.ai 2021

In the US, the tech sector is where remote work has been the most prevalent despite the loosening of pandemic rules in the spring of 2021. With the pandemic resurgence, Google, Apple, Facebook and Amazon announced that their offices would still be closed until at least January 2022. Twitter made the switch to remote work permanent.

Tech workers are staying home (for now)

#stateofai | 91

92 of 188

Section 3: Industry

stateof.ai 2021

#stateofai | 92

93 of 188

stateof.ai 2021

2020 Prediction: An AI-first drug discovery companies IPOs or is acquired for $1B

British AI-first drug discovery company, Exscientia, originated the world’s first 3 AI-designed drugs into Phase 1 human testing and IPO’d on the NASDAQ on 1 October 2021 at a >$3B valuation. Exscientia is now the UK’s largest biotech and the 3rd largest biopharma company in the UK next to GSK and AstraZeneca. The company has a further 4 more drug candidates currently undergoing advanced profiling for submission of investigational new drug applications, in addition to more than 25 active projects in total.

10x fewer synthesized compounds to deliver a candidate

12 months target-to-hit vs. 54 months industry average

#stateofai | 93

94 of 188

Drug selection for cancer patients is highly inefficient: over 90% of patients do not respond to the therapy that is selected by their oncologist. Why? Selection methods such as mutation sequencing are too reductionist. By contrast, Allcyte’s AI (left figure) finds the most potent drug for a given patient. AI-based microscopy is used to measure how live cancer cells respond to 140 clinically-approved third-party anticancer drugs at the single cell level. In a prospective clinical trial of 56 blood cancer patients (right figure), those patients who received AI-guided therapy achieved a 55% overall response rate and a statistically significant improvement in progression-free survival over their respective prior line of therapy.

Computer vision identifies the most potent drug for each cancer patient to improve survival

stateof.ai 2021

#stateofai | 94

95 of 188

stateof.ai 2021

Recursion Pharmaceuticals, a Utah-based AI-first company that makes use of high-throughput screening and computer vision-powered microscopy to discover drugs, raised $436M in its NASDAQ IPO in April 2021. The business has 37 internally-developed drug programs including 4 clinical-stage assets. By conducted targeted exploration of biological search space with compound and disease cell type combinations, the company is building a “map” of disease biology. With this map, the company is predicting tens of billions of relationships between disease models and therapeutic candidates. This includes relationships that are predictive of candidate mechanism of action, which expands the discovery funnel beyond hypothesized and human-biased targets.

2020 Prediction: An AI-first drug discovery companies IPOs or is acquired for $1B

#stateofai | 95

96 of 188

stateof.ai 2021

DELs are composed of billions of small molecules with unique DNA barcodes attached. Previous ML applied to DELs coarsely aggregated data to smooth out noise. By adapting graph neural networks to reflect the DEL process, Anagenex lowers noise and designs novel libraries to complete wet lab-guided active learning loops.

Active learning using custom GNNs for improved drug discovery lead-finding with DEL data

DEL data links DNA sequences to a set of possible molecules. A GNN specially adapted to this structure reduces noise and leverages all molecules in the set to predict binding affinity.
Anagenex has used this technique to find hits to challenging targets with a >20% confirmation rate (VS 1% for traditional HTS or 5% for docking).
Anagenex uses the model to design and synthesize new libraries, closing a lab-powered active training loop.

DNA tag

Accessible molecules

Model: custom DEL tuned GNN

Synthesizable or purchasable compounds

Evolved library design

DEL library

Active learning loop

#stateofai | 96

97 of 188

stateof.ai 2021

Treatments for inflammatory bowel diseases such as Crohn’s Disease and Ulcerative Colitis need not only inhibit inflammation, but must also survive while travelling through the gut. In order to achieve this, LabGenius simultaneously co-optimised potency and stability in the presence of protease. Their approach resulted in protein designs that had ~400 fold greater potency and a ~100 fold increase in protease stability in comparison to molecules designed by traditional methods.

Convolutional neural networks help design better protein therapeutics

First, potency and stability were modeled and these models were used to navigate through different protein variants towards improved designs.
A simulation based on empirical measurements of all single mutation variants of the protein and assuming a linear sequence-to-function relationship finds significant improvements to both potency and stability. Both graphs represent the same pool of molecules.

#stateofai | 97

98 of 188

stateof.ai 2021

Intenseye’s computer vision models are trained to detect over 35 types of employee health and safety (EHS) incidents that human EHS inspectors cannot possibly see in real-time. The system is live across over 15 countries and 30 cities, having already detected over 1.8M unsafe acts in 18 months.

Real-time computer vision protects employees from workplace injuries (or worse)

Computer vision has digitized over 3,000 health and safety inspections that can now run 24/7. This AI-first approach has saved 1,460 hours of one Intenseye user, per year.
Intenseye creates a collaborative workflow that connects AI, workplace analytics and behavior change to result in fewer injuries, reductions in insurance premiums, and an overall increase in company productivity.

Heatmap of incidents

Employee not wearing PPE

Dangerous driving

#stateofai | 98

99 of 188

stateof.ai 2021

Computer vision unlocks faster recovery from natural disasters

Climate change is increasing the severity of natural disasters, inflicting $190B of damage to homes worldwide in 2020, 4x more than in 1990. The global population exposed to natural disasters will increase 8x by 2080. Tractable's AI-augmented system allows homeowners to take photos of damage to their home after a natural disaster (e.g. hurricanes) to predict repair costs and unlock insurance claim payouts months faster.

The solution is in use by a leading Japanese insurer and expected to help thousands of households recover more quickly from the impact of Japan’s typhoon season in Q3/Q4 2021, eg from Typhoon Mindulle (projected: $100M in damage to 20,000 households)
Tractable plans to expand its system to accelerate recovery from hail storms and floods, as well as identify homes exposed to fire risk from nearby vegetation.

a

b

c

a. Climate change causes property damage from natural disasters

b. Typhoon Mindulle about to strike Japan, Oct 2021

c. Tractable’s user-facing application

#stateofai | 99

100 of 188

UK National Grid ESO halves error of electricity demand forecast using transformers

stateof.ai 2021

National Grid Electricity System Operator (ESO) are responsible for balancing electricity supply and demand in real time. Forecasts of electricity supply and demand are essential for this task.
Open Climate Fix worked with ESO to build a new forecasting system based on the Temporal Fusion Transformer, which has been delivering forecasts to the control room since May 2021.
The system has more than halved the mean absolute error (MAE) of the ESO’s previous forecast with a lead time of 1 hour and reduced the MAE of a 24 hour lead time forecast by 14%. This should lower carbon emissions and costs.

Forecast lead time	Reduction in mean absolute error (MAE)
1 hour	58%
4 hours	25%
8 hours	11%
24 hours	14 %

Predicting demand is essential to achieving ESO’s ambition of running the grid on net-zero generation by 2025.

#stateofai | 100

101 of 188

stateof.ai 2021

Improving the sustainability and carbon efficiency of farms using predictive models

Dairy cow farmers monitor their livestock to for health issues and the onset of calving. Using deep learning to analyse accelerometer data from a neck-worn sensor, Connecterra is able to predict health issues 2-3 days prior to human observation. They can also predict the onset of calving, which reduces the number of days that pregnant cows are treated with antibiotics by 50% (left graph). Connecterra can predict milk yield with <1% margin of error up to 200 days in the future (right graph, blue = less error), which could reduce CO₂ emissions.

2018

2019

2020

0

1

2

3

4

5

AI-guided

Human-guided

Industry-standard

Average number of days animals are treated with antibiotics

AI milk predictor

#stateofai | 101

102 of 188

Good and bad gut microbiome bacteria identified

Nutrition: Good and bad gut microbiome bacteria and their connections to food identified from metagenomic sequencing of 1,100 study participants

stateof.ai 2021

15 best and 15 worst bacteria by correlation against a broad range of health markers (personal health scores, fasted blood tests, post-meal blood tests and habitual diet).

Diet can change your gut microbiome

Successful prediction of whether a person drinks coffee based on bacteria present in their gut microbiome (UK-trained model performance on UK & US test sets).

#stateofai | 102

103 of 188

AMD is the most common cause of blindness in Europe and North America. However, there is currently no treatment for “dry” AMD, which is hard to detect at early stages and can lead to blindness at late stages if left untreated.
A team at Moorfields Eye Hospital in London have developed a computer vision system that can automatically detect and monitor this condition.
The system uses two models (right figure): one predicts disease progression, while the other can determine specific features of the disease. It was developed using optical coherence tomography scans from 200 patients and validated on 110 patients.

Expert-level quantification of “dry” age-related macular degeneration (AMD) developed by a UK-based NHS team

Eye disease is a petri dish for medical AI development in the clinic

stateof.ai 2021

#stateofai | 103

104 of 188

stateof.ai 2021

One of the few real-world deployments of AI that addresses the pandemic is the reinforcement learning (RL) system, Eva, which was developed in Greece. Given a specified fraction of travellers who could be tested, Eva selected which specific passengers to test at the Greek borders. Eva identified 1.5x - 4x more positive infections at a given testing fraction than random selection.

Reinforcement learning for an effective Covid testing strategy

There is often limited testing capacity at borders. A solution could be a robust automated system capable of accurately predicting who should be tested.
Eva is based on multi-armed bandits, which are able to balance two objectives: (a) maximizing the number of tests allocated to types of individuals identified as likely to be asymptomatic carriers of the virus and (b) allocating tests to new types of individuals in order to better estimate their infection likelihood.
Eva managed to achieve great success despite using the minimum possible data in order to comply with the GDPR. It is worth noting than random selection is perhaps not the most rigorous baseline.

#stateofai | 104

105 of 188

Viz.ai achieved 96% sensitivity and 94% specificity in identifying large vessel occlusions in 2,544 consecutive patients from 139 hospitals using scanners from multiple manufacturers.
Faster triage with Viz.ai enables the identification and treatment of more patients who are eligible for thrombectomy, improving patient outcomes, reducing chances of long-term disability.
Viz.ai alerts are 52 minutes faster than the standard of care, resulting in a 40% improvement in patient outcomes.

stateof.ai 2021

Viz.ai’s stroke detection software helps 1 patient every 47 seconds in the US today

A stroke occurs when the brain is deprived of its blood supply. Within minutes, brain cells begin to die from a lack of oxygen and nutrients, which results in irrecoverable damage. Rapid detection of brain strokes is crucial, but clinically challenging. In 2021, a real-world multi-center study of 45 stroke patients tested a deep learning system from Viz.ai versus standard of care. It found that the AI-based approach reduced the transfer time for a patient post-imaging at a primary stroke center to a comprehensive stroke center by 39% on average.

#stateofai | 105

106 of 188

stateof.ai 2021

With the increasing power and availability of ML models, gains from model improvements have become marginal. In this context, the ML community is growing increasingly aware of the importance of better data practices, and more generally better MLOps, to build reliable ML products.

Insights from ML in production nudge researchers from model-centric to data-centric AI

A simplistic view of ML casts the development workflow as a sequential from data to models and into production.
However, as many more models are deployed into production, it became clear that continual data management is critical to maintain model performance. Data collection and labeling procedures must adapt to distribution shifts as the ML system caters to more users.
The research community is launching several initiatives to raise awareness about data-centric AI. For e.g. NeurIPS 2021 will have a data-centric AI track; Chris Re’s group launched a data-centric AI repo on GitHub to aggregate resources; Andrew Ng’s deeplearning.ai is organising a data-centric AI competition, in which participants are given a fixed model and are asked to modify the data to achieve the best possible performance.

Improve data

Train model

Error analysis

Data-centric

Fixed model, evolving data

Improve model

Train model

Error analysis

Model-centric

Fixed data, evolving model

#stateofai | 106

107 of 188

stateof.ai 2021

Due to the rapid progress in model development, beating benchmarks has become a matter of months. The high-performing models nonetheless often fail in real-world scenarios. Dynamic Benchmarking, where datasets are continuously updated by human users, are a solution to make benchmarks more useful.

Machine Learning in production: active benchmarking

Dynabench is a web-based open-source tool that allows users to propose difficult examples that fool the model or make it very uncertain. These examples are then validated by expert linguists and crowdworkers.
The collected data can be used to both evaluate current state-of-the-art models and train other models.
The aim of dynamic benchmarking is to create a virtuous cycle where models are improved to be able to deal with harder examples. Then, it becomes increasingly harder to fool the models, which hopefully evolve to be robust to the worst case scenarios that are encountered in the real world.

#stateofai | 107

108 of 188

stateof.ai 2021

Two new datasets to deal with distribution shifts: WILDS and Shifts.

Machine Learning in production: distribution shifts

A distribution shift happens when data at test/deployment time is different from the training data. In production, this often happens in the form of concept drifts, where the test data gradually changes over time.
As ML is increasingly used in real-world applications, the need for a solid understanding of distributional shifts becomes paramount. This begins with designing challenging benchmarks.

A team from several American and Japanese universities and companies have built WILDS, a benchmark of 10 datasets of distributional shifts in tumor identification, wildlife monitoring, satellite imaging, and more.
Shifts, developed by the Russian Yandex, is more industry-focused, and includes 3 tasks: weather prediction, machine translation and vehicle motion prediction.

#stateofai | 108

109 of 188

stateof.ai 2021

A more pernicious problem in ML systems is underspecification: Models trained and tested successfully on the same data, but using different random seeds, can behave differently on real-world data.

Machine Learning in production: underspecification

Researchers from Google, MIT, UCSD and Stanford illustrate this problem with examples from computer vision, medical imaging, NLP, clinical risk prediction based on health records, and medical genomics.
While they identify the problem and illustrate it, they do not have a definitive solution, and hope to spur interest in improving the machine learning pipeline to tackle the underspecification challenge. But it is unclear whether it can be tackled at all.

#stateofai | 109

110 of 188

A systematic review of all papers published in 2020 that reported using ML for diagnosis and prognostication of Covid-19 found that “none of the reviewed literature reaching the threshold of robustness and reproducibility essential to support utilization in clinical practice.” There were many methodological, dataset, and bias issues.
For example, 25% of papers used the same pneumonia control dataset to compare adult patients without mentioning that it consists of kids aged 1-5.

stateof.ai 2021

Despite a loud call to arms and many willing participants, the ML community has had surprisingly little positive impact against Covid-19. One of the most popular problems - diagnosing coronavirus pathology from chest X-ray or chest computed tomography images using computer vision - has been a universal clinical failure.

Machine Learning in production: beware of bad data

#stateofai | 110

111 of 188

AutoML is enabling model-in-the-loop training data to become more common (V7 data platform, left graph).
As their confidence in tooling and data quality grows, ML teams are launching more projects. Training datasets are no longer a fixed object but continuously growing corpus of knowledge.
The four fastest growing computer vision use cases in 2021 (four panels) are unstructured document processing, KYC on new uses of trading platforms, 3D CT and MRI, and ultrasound video.

stateof.ai 2021

Data-driven AI: training datasets grow with models in the loop

With automated labelling, and plateauing architecture performance, training data quantity and quality becomes the competitive metric for AI-first startups.

Unstructured documents

KYC w/ID

3D CT & MRI

Ultrasound video

Ratio of ground truth annotations made by AI vs. humans across computer vision teams

#stateofai | 111

112 of 188

Since our 2020 Report, NVIDIA has faced mounting resistance from several angles over its planned $40B acquisition of Arm: industry players who compete with NVIDIA, customers of Arm, regulators and governments. In September, 2020, NVIDIA had laid out an 18 month plan to complete the deal. The company has now stated that the deadline will not be met and needs to be extended into September 2022.

2020 Prediction outcome: NVIDIA does not complete its acquisition of Arm

stateof.ai 2021

Arm is a world leading supplier of CPU chip architecture and design intellectual property in the world. Over 95% of smartphones depend on its designs. The company is also expanding the small footprint of Arm-based servers in data centers.
Customers of Arm designs are concerned that their ownership by NVIDIA will consolidate too much power and destroy its status as a neural player.
NVIDIA is facing regulatory push back from the UK where the deal is viewed as a “politically charged symbol of the country’s loss of corporate influence in the face of foreign takeovers.” Some are floating the idea of re-listing Arm on the stock exchange.
A formal application with Chinese antitrust regulators was only filed 8 months after the transaction was announced. This could lead to further delays up to 18 months.

?

#stateofai | 112

113 of 188

Europe and the US want to buy themselves semiconductor sovereignty. Is this realistic?

stateof.ai 2021

Over the last 30 years, the industry has been beneficiary of geographical specialisation across more than 50 different types of sophisticated wafer processing and testing equipment, and 300 different input materials. In a matter of months, the Covid-19 pandemic exposed 50+ points across the semiconductor supply chain where a single region accounts for 65% or more of the total global supply as key vulnerabilities. Despite earmarking $200B between the US and Europe, achieving semiconductor sovereignty could cost >$1T in upfront investment. This is 6x the combined R&D investment and capital expenditure of the total semiconductor value chain in 2019.

#stateofai | 113

114 of 188

Europe woke up to its largest company, ASML, the linchpin to global semiconductors

stateof.ai 2021

The Netherland’s ASML provides chip makers with essential hardware, software and services to mass produce patterns on silicon using a method called lithography. The company is alone in offering extreme ultraviolet lithography (EUV) machines that unlocks the leading manufacturing nodes (e.g. 3nm and 5nm at TSMC). Each EUV machine, which has over 100,000 parts and costs $150M, ships in 40 freight containers (or 4 jumbo jets).

In H1 2021, ASML posted €8.4B of net sales (up 45% compared to H1 2020) at a net income margin of 30%. The company also spent almost €1B in R&D (up 20%) in the same period to cement its technical leadership.
The company grew throughout the pandemic with no issues, largely fuelled by the global chip shortage, the acceleration of the digital infrastructure, the push for “technological sovereignty”.
Today, ASML is worth $367B in public markets. This reflects 3x market cap growth since right before the pandemic.

#stateofai | 114

115 of 188

Manufacturers suffer from Covid-induced supply chain disruptions for semiconductors

stateof.ai 2021

Almost all electronic goods depend on semiconductors. Due to Covid lockdowns and rising demand for electronics, manufacturers are suffering from never-before-seen wait times of 4 months+ between ordering a chip and receiving it. Anecdotally, wait times today are more like 6-12 months with chip shortages into Q2 2022.

#stateofai | 115

116 of 188

A semiconductor drought is costing the automotive sector upwards of $110B in lost sales

stateof.ai 2021

Halfway through 2021, global auto companies have produced 4 million cars less than expected, down 15% on average. Toyota signaled that it would cut production by 40% worldwide in September 2021. By contrast, large technology companies have not been complaining about semiconductor supply shortages, which suggests there is a bifurcation in the “haves” and “have nots”.

Global motor vehicle production: Q1 2021 vs Q4 2020

Global motor vehicle production over 20 years

#stateofai | 116

117 of 188

Major semiconductor fabs commit $400B for new capacity as global market hits $551B

stateof.ai 2021

Intel’s new CEO, Pat Gelsinger, committed the company become a major contract chip maker. One month after his appointment, Gelsinger pledged $20B to build two new plants in Arizona. He followed with another $3.5B expansion into New Mexico, and in September 2021 said he plans to build $95B worth of new chip fabs in Europe. Intel’s stock price is up 21% since 1 Jan 2019.

TSMC’s CEO, C.C. Wei, said the company would invest $100B over the next three years to boost capacity, which is more than double the company’s expenditure in the last years. This includes TSMC’s planned chip fab in Arizona. Stock price is up 256%.

Samsung said it would invest $205B over the next three years across its chip manufacturing (Samsung Electronics) and its contract drug manufacturing businesses (Samsung Biologics). This includes a $17B chip fab based in either Texas, New York or Arizona. Stock price is up 114%.

#stateofai | 117

118 of 188

$52B CHIPS for America Act gains support from Semiconductors in America Coalition

75% of the world’s semiconductors and key material suppliers (silicon wafers, photoresist, specialty chemicals) - are manufactured in China and East Asia despite eight of the 15 largest semiconductor firms in the world being in the US.
In June 2021, the US Senate passed the US Innovation and Competition Act, which includes $52B in federal investments for the domestic semiconductor R&D and manufacturing provisions in the CHIPS for America Act.
Members of the Semiconductors in America Coalition include Intel, Nvidia, Qualcomm, Amazon Web Services, Apple, AT&T, Google, Microsoft and Verizon.

stateof.ai 2021

In 2019, our Report noted that “China is (slowly) ramping up on its semiconductor trade deficit.” In 2020, China imported $350B worth of chips, an increase of 14.6% vs. 2019, notably from US manufacturers. However, as the US-China trade war has dramatically escalated in the last 12 months, the US has taken the view that its eroding share of global semiconductor manufacturing capacity from 37% in 1990 to 12% today is no longer acceptable.

#stateofai | 118

119 of 188

stateof.ai 2021

In the last 12 months, CrowdStrike has almost doubled its market capitalisation to $60B and reached $1.3B ARR. The company is demonstrating the platform potential of AI-first technology in cybersecurity: 53% of its 13,080 subscription customers purchase more than 5 products and 29% subscribe to more than 6 products. Meanwhile, SentinelOne (124%) and CrowdStrike (120%) are firmly in the high-growth net dollar retention segment of SaaS companies, suggesting that their customers expand their subscription spend year on year.

Public market investors favor AI-first cybersecurity players: CrowdStrike ($60B), Darktrace (£5B), SentinelOne ($18B), Riskified ($6B)

#stateofai | 119

120 of 188

stateof.ai 2021

UiPath (robotic process automation), Snowflake (cloud data platform), and Confluent (Kafka-based data streaming) represent $138B of newly created public market value in 2021 with revenues growing 50-100% YoY at this scale. All three companies have best-in-class net dollar retention above 130% and 2% of their customer base spending over $1M per year. Snowflake became the largest software IPO of 2020, raising $3.35B.

The enterprise data and automation sector is on fire: Snowflake, UiPath, Confluent IPOs

#stateofai | 120

121 of 188

stateof.ai 2021

Since launching its original data platform built on Apache Spark in 2015, Databricks has grown into a one-stop home for (un)structured data, automated ETL, collaborative data science notebooks, business intelligence using SQL, and full-stack machine learning built on open source MLflow. Interestingly, all three major cloud vendors - Amazon, Google, and Microsoft - invested in Databricks in February 2021.

Databricks: The enterprise data/AI juggernaut reaches $38B valuation and $600M ARR

#stateofai | 121

122 of 188

stateof.ai 2021

AEye, Quanergy, Ouster, Innoviz, Aeva, Luminar, and Velodyne raised $1.3B in private markets, $2.9B via SPACs, and went public at a cumulative valuation of $12.4B. None of these companies had significant revenue going into their SPAC. Together, they project $2.9B in 2024 revenue even though they sell hardware and software products to overlapping autonomous driving customers and other nascent markets.

All seven major private LiDAR companies have SPAC’d and trade below their IPO price

Note: Gross proceeds = PIPE + cash in trust, stock price data from 30 Aug 2021

#stateofai | 122

123 of 188

Survival of the fittest: Waymo, Cruise, Aurora rev up their balance sheets and trucks SPAC

+$2.5B

June 2021

+$2.75B

April 2021

Sold for $550M

July 2021

Sold for 26% of Aurora

+ $400M financing

December 2020

+$2B cash

$10.6B SPAC

July 2021

+$1.35B cash

$8.5B SPAC

April 2021

+$614M cash

$5.2B SPAC

July 2021

+$345M cash

$3.3B SPAC

May 2021

stateof.ai 2021

>$5B raised by Waymo and Cruise.

Lyft and Uber offload AV teams.

>$4B raised as trucking and consumer AVs SPAC to become public companies.

+

#stateofai | 123

124 of 188

Modular

Complex sensors, HD Maps + Hand-coded rules

Today’s modularised approach struggles with brittle decision-making in prediction/planning. An alternative approach is one that uses end-to-end deep learning from cameras and GPS as a solution to decision complexity.

Learning to drive with a large network, trained end-to-end with perception

stateof.ai 2021

End-to-end

Deep Learning

#stateofai | 124

125 of 188

Another approach makes heavy use of offline simulations learned from real-world observations and planners that learn from training datasets that are collected at scale using expensive camera sensors. This system has been successfully tested on self-driving vehicles in downtown San Francisco in 2021.

Learn to simulate, then train an RL driving system in the simulation

stateof.ai 2021

AV deployment

Offline system evaluation: Few hours are needed

Learn a simulator in which to train an RL agent

#stateofai | 125

126 of 188

stateof.ai 2021

Chinese institutions won all first and second places in all tasks of the 2021 AI City Challenge.

Chinese institutions dominate research in Smart Cities

The Challenge is organized by NVIDIA, QCraft, and American, Indian and Australian universities.
The challenge tracks include tasks like counting vehicles in an intersection, tracking vehicles across multiple camera views, detecting traffic anomalies, and finding vehicles using natural language descriptions.
Baidu, Alibaba, Sun Yat-sen University, Shenzhen Institute of Advanced Technology and UCAS all won one or multiple tracks.
This reflects China’s massive investments in building smart cities and supporting computer vision research.
Some observers also worry this success is synonymous with accrued government surveillance.

#stateofai | 126

127 of 188

stateof.ai 2021

China’s SenseTime, a $12B facial recognition software company that powers surveillance on Uighur Muslim detainment camps and was blacklisted by the Trump administration in 2019, filed to list on the Hong Kong stock exchange. The company generated $525M of revenue in 2020.

Facial recognition: upcoming IPOs and fundraising despite controversy and lawsuits

#stateofai | 127

128 of 188

stateof.ai 2021

In the US, Clearview AI has been sued by the American Civil Liberties Union over face scraping in Europe and by immigrant rights groups in California. Even so, the product has been widely trialed by law enforcement and governments in 24 countries and has continued to raise capital from private investors.

Clearview AI: despite lawsuits and bans in Europe and Canada, the company continues

#stateofai | 128

129 of 188

stateof.ai 2021

Google infuses AI capabilities into more of its business and consumer applications

Beyond Gmail’s popular Smart Reply feature, the company’s AI-based grammar checker is now live across Sheets, Docs, and Slides. Sheets now also provides context-aware formula predictions and allows you to ask questions of your data in natural language. Maps is receiving over 100 new AI-first features, including indoor navigation with AR and a new routing option that optimises for lower fuel consumption and CO₂ emissions. Google also open sourced MediaPipe, a cross platform toolkit for integrating fast inference computer vision functionality.

#stateofai | 129

130 of 188

stateof.ai 2021

OpenAI GPT-3 integrations: Microsoft Power Apps, GitHub Copilot, and 300 other apps

Power Apps users can describe a programming goal in natural language and have GPT-3 automatically transform it into Power Fx code. Meanwhile, GitHub users can call on Codex (descendant of GPT-3) to generate whole lines or entire functions within from within their code editor. After surpassing 300 apps build with GPT-3, OpenAI launched a $100M fund to invest in startups that make use of their APIs.

#stateofai | 130

131 of 188

stateof.ai 2021

Startups in the US, Canada, and Europe raise close to $375M in the last 12 months to bring large language model APIs and vertical software solutions to customers who cannot afford to directly compete with Big Tech. This is significant momentum in a single year when cast against the early acquisitions of NLP startups including Maluuba ($140M in 2017), Semantic Machines (rumored $150-250M in 2018) and SwiftKey ($250M in 2016).

Large language models for all: startups raise $375M to translate research into industry

#stateofai | 131

132 of 188

stateof.ai 2021

Google’s Super-Resolution via Repeated Refinement (SR3) model iteratively refines a noisy 64x64 image into a high-quality 1024x1024 image that outperforms generative adversarial networks. Meanwhile, China’s SenseTime showcased its 30x super-resolution zoom that marries computer vision with a custom AI chipset.

Image super-resolution enables super-zoom on consumer grade smartphones

#stateofai | 132

133 of 188

The rapid growth of consumers selling products online is supported by AI-first photo app

stateof.ai 2021

ClipDrop enables >11k subscribed online sellers to create beautiful product imagery with a single click. Computer vision-based scene understanding and segmentation enables the extraction of objects from real life settings without the need for photo studies or complex post-processing. This is powering a huge surge in secondhand good selling worth an estimated $27B in 2020, which according to market research is growing several times faster than primary retail.

Paying subscriber growth

#stateofai | 133

134 of 188

stateof.ai 2021

Robotic picking and packing is helping retailers meet a growing demand for online deliveries. Leading online grocery technology company, Ocado, uses computer vision and proprietary grasping technology to efficiently pick and pack items for grocery orders. In e-commerce, robotic picking platform SORT will handle 300M+ items by the end of 2021. Reinforcement learning tool (RLScan) is a very early example of RL success in production environments of robotic systems at scale.

Deep reinforcement learning-enhanced picking robots support a surge in online grocery

SORT is a hybrid autonomy system: real-time human support via teleoperation with ML increases speed, accuracy, and uptime.
RLScan uses deep RL to train a closed-loop control scanning policy, conditioned on a real-time video feed. An RL agent is trained from end-to-end directly in production, learning from a fleet of robots across multiple production sites. RLScan achieves optimal barcode scanning behavior for handling complex product assortments.
RL raised overall system speed by 2% and counting, with the learning curve continually increasing (right figure).

#stateofai | 134

135 of 188

stateof.ai 2021

A key differentiator for online grocers is breadth of their in-stock product range. This is challenging to achieve: order too little stock and customers won't be able to buy the items they want, but ordering too much would increase waste and hit margin.

Deep learning automates 98% of stock replenishment decisions for online grocers

At Ocado, sequence to sequence deep learning models are now being used to forecast demand for ranges of up to 55,000+ products (SKUs).
Monthly data from 2019 at Ocado Group’s UK Hatfield and Dorden sites showed cost savings of £250,000 per month thanks to 5% more accurate forecasting. In addition, waste reduced from 0.6% to 0.3% of total products, while product availability increased from 92% to 94.5%.
Today, Ocado’s retail partners making use of this automated demand forecasting tool let it manage 98% of their replenishment decisions.
They have seen 30% more accurate forecasting vs. previous solutions, saving time while slashing costs and food waste. The right graph shows 50-day forecasts (orange) vs. actual (black) for various SKUs.

Forecast lead time

Daily sales

#stateofai | 135

136 of 188

AI-last: large scale first party data unlocks new AI product opportunities at Shopify

stateof.ai 2021

In April 2016, Shopify launched an ML-driven lending solution called Shopify Capital that preemptively offers working capital advances up to $2M that can be unlocked in 2-5 days by high performing merchants on their platform. Shopify Capital has grown to $2.3B in cumulative capital advanced since its launch and 137% YoY by Q2 2021. Interestingly, 76% of merchants who used this product returned for at least one additional round of funding and merchants averaged 36% higher sales in the first 6 months compared to their non-funded peers.

#stateofai | 136

137 of 188

stateof.ai 2021

To detect Child Sexual Abuse Material (CSAM) while preserving user privacy, Apple intended to use NeuralHash, a hashing method for images based on neural networks. Apple claimed that this method enabled images to be compared on device with a known CSAM database while only having access to the photos if they contain CSAM. Faced with criticism from privacy advocates and technical experts, Apple delayed the launch of their system.

Apple faces the complex problem of AI-based privacy

Critics call into question both the effectiveness of the method in detecting CSAM and its potential invasion of privacy.
It was found that NeuralHash occasionally results in collisions (semantically different images having the same hash), which might give human reviewers unnecessary access to private photos.
Researchers further worry that since NeuralHash is based on neural networks, it might be sensitive to adversarial attacks which maliciously cause the algorithm to identify regular photos as CSAM.
In response, Apple insisted that the version of NeuralHash analyzed by researchers will be improved before it is deployed.

#stateofai | 137

138 of 188

stateof.ai 2021

With the regulation of third party cookies and the increasing public awareness of the importance of data privacy, browsers are compelled to find new privacy-preserving solutions for their advertising business.

Browser-based federated learning thrives in a post-cookie world

Federated learning (FL) is a machine learning technique that makes it possible to train models across multiple decentralized servers without centralizing training data.
Brave is a privacy-first browser which authorises ad targeting only by user “opt-in”. In return for their attention, users are rewarded with cryptocurrency.
Brave uses FL to alleviate the need for storing and collecting user data while still delivering well-targeted ads. They show that they can achieve a hit ratio at 10 (HR@10) of up to 70% while achieving almost perfect privacy preservation.
Google began rolling Federated Learning of Cohorts (FLoC) on Chrome in Q2 2021. To avoid providing individual user data to third-party advertisers, FLoC groups together users into cohorts with similar browsing histories without ever centralizing these histories. Google clients only access data about cohorts, not individual users.
Google claims that advertisers “can expect to see at least 95% of the conversions per dollar spent when compared to cookie-based advertising.”

#stateofai | 138

139 of 188

stateof.ai 2021

Critics worry that FLoC makes it easier for advertisers to track users across the web. All other browsers (e.g. Firefox, Brave, Edge) refused to integrate FLoC. DuckDuckGo even created a Chrome extension to block it.

But is Google’s Federated Learning of Cohorts a good alternative to third party data?

Even though individual user information is obstructed, Google still shares cohort information by default. Brave argues that this violates a basic principle of privacy: “don’t tell others things you know about me without my permission.”
It is not clear whether other websites can use the cohort IDs alongside their first-party data to identify users, a process referred to as “fingerprinting”.
Another fear is that algorithmic generation of cohorts might result in undesirable identification and targeting of vulnerable groups.
Observers also worry that we might be facing a pernicious effect of third party data regulation: the advertising power concentrates in the hands of Google and Apple. Indeed, these companies control a large share of first-party data thanks to their browsers and operating systems.
A telling sign of the ambivalence of Google’s approach is that they are not testing FLoC in the EU and the UK for fear that it might be illegal.

#stateofai | 139

140 of 188

stateof.ai 2021

In a world-first, South Africa granted a patent to an AI system. The system, called Dabus, invented a method to better interlock food containers. Most countries, however, do not recognize a machine as an inventor.

AIs play Go. AIs paint. AIs make music. Now AIs… invent?

The patent application was submitted to patent offices in the US, the EU, Australia and South Africa. It was rejected in the US and the EU, and a particular ruling on this patent is still in waiting in Australia.
In the US, a judge ruled that only a human can hold a patent, not a machine. This is because according to American law, “a natural person” needs to take an oath that they are the inventor. A contradictory ruling came out in Australia, which stated that an AI can be named as an inventor in a patent application.
As of today, much of the arguments that have led to the rejection of the patent pertain to the incompatibility between the existing laws and the evolution of AI systems. Will the law evolve to accommodate AI inventors?

*

*These are not actual images of AIs.

#stateofai | 140

141 of 188

Investing in AI: 182 active AI unicorns totaling $1.3T of combined enterprise value

The US outperforms other countries in the number of AI unicorns, followed by China, UK & Israel. US unicorns have reached a combined market value of over €800 billion.

stateof.ai 2021

	Number of AI unicorns	Total funding raised	Combined enterprise value	Examples
United States
China
United Kingdom
Israel
Canada
Germany
Singapore
Switzerland
Hong Kong
France
South Korea
Japan
India
Belgium
Bermuda
Taiwan
Sweden

103

35

10

8

4

3

2

1

€55B

€26B

€4B

€2B

€1B

€2B

€1B

€3B

€1B

€100M

€400M

€300M

€200M

€100M

n/a

€801B

€346B

€69B

€25B

€8B

€14B

€5B

€4B

€9B

€2B

€1B

€2B

€1B

€4B

#stateofai | 141

142 of 188

Investing in AI: American AI startups attract the most money but EU+UK is growing fast

stateof.ai 2021

2019

2020

2021 EST

2016

2017

2018

2015

2014

2012

2013

2011

2010

▊ China

▊ USA

▊ European Union & UK

€75B

€50B

€100B

€25B

€0B

The US accounts for ⅔ of global AI investments and the EU+UK is on track to double its share by 2021.

€8B

€17B

2.1x

#stateofai | 142

143 of 188

Investing in AI: mega rounds are now commonplace as AI startups mature globally

stateof.ai 2021

€250M rounds account for 48% of all capital invested in AI startups in 2021, up from 42% in 2020. We see the same trend for €100M-€250M rounds and Series C rounds, both of which are more represented in 2021.

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

€60B

€40B

€80B

€20B

€0B

▊ €0M-€1M (Pre-Seed)

▊ €1M-€4M (Seed)

▊ €4M-€15M (Series A)

▊ €15M-€40M (Series B)

▊ €40M-€100M (Series C)

▊ €100M-€250M

▊ €250M+

‘250M+’ rounds have accounted for 48% of all investment in 2021 so far

#stateofai | 143

144 of 188

stateof.ai 2021

Combined enterprise value of private companies (AI vs SaaS)

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

€1.5T

€1.0T

€2.0T

€0.5T

€0B

▊ AI

▊ SaaS

Combined EV in EUR B:	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
All AI companies	5.4	7.3	11.6	17.5	35.3	70.1	102.6	164.7	334.3	517.2	715.9	1200
All SaaS companies	21	26.9	39.1	51.6	85.4	157	201.8	279.5	487	712.3	1000	1800

Investing in AI: the combined enterprise value of private AI startups & scaleups is ⅔ that of private SaaS startups & scaleups

#stateofai | 144

145 of 188

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

€600B

€400B

€800B

€200B

€0B

▊ €0M-€200M

▊ €200M-€800M

▊ €800M-€8.0B

▊ €8.0B+

stateof.ai 2021

�	��
�	�
	�

Cloud data platform

($38B valuation)

Process mining

($11B valuation)

Healthcare data analytics

($8.1B valuation)

Revenue intelligence

($7.25B valuation)

Training data platform

($7.3B valuation)

AI cloud platform

($6.3B valuation)

Combined enterprise value of private

AI SaaS companies

Investing in AI: over €600B of combined enterprise value of private AI-first SaaS startups & scaleups and SaaS startups & scaleups actively using AI

#stateofai | 145

146 of 188

Investing in AI: enterprise software is the most invested category globally, 2010-2021

stateof.ai 2021

The data-rich domains of Health and Fintech are also particularly popular investing categories globally.

Enterprise software

Transportation

Fintech

Health

Food

Robotics

Marketing

Media

Security

Telecom

Education

Semiconductors

Energy

Travel

Real Estate

Home living

Gaming

Jobs recruitment

Legal

Fashion

Sports

Music

Hosting

Wellness beauty

Event tech

Kids

Dating

Amount invested

€105B

€89B

€71B

€46B

€45B

€37B

€32B

€31B

€28B

€16B

€14B

€11B

€10B

€7B

€6B

€3B

€2B

€1B

€36M

€3B

€34M

€8M

Number of rounds

Enterprise software

Health

Fintech

Marketing

Security

Media

Robotics

Education

Transportation

Energy

Food

Jobs recruitment

Real Estate

Semiconductors

Telecom

Legal

Travel

Fashion

Home living

Sports

Gaming

Music

Wellness beauty

Kids

Event tech

Hosting

Dating

7.5K

3.0K

2.9K

1.7K

1.6K

1.4K

903

885

664

614

569

545

518

351

342

319

288

286

192

189

130

128

98

28

#stateofai | 146

147 of 188

stateof.ai 2021

Software

Robotics

AI biotech

Defense

€155B

€37B

€22B

€0.6B

Amount invested (2010-21)

Number of rounds (2010-21)

Software

Robotics

AI biotech

Defense

10K

1K

753

77

Investing in AI: software leads while robotics, AI biotech and defense are growing

#stateofai | 147

148 of 188

stateof.ai 2021

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

▊ China

▊ USA

▊ European Union & UK

150

100

200

50

0

250

Exits in AI: American AI startups consistently account for ⅔ of exits globally and EU+UK account for roughly ⅓ with the remainder to China

#stateofai | 148

149 of 188

Exits in AI: almost 3-fold increase in enterprise value creation in the last 12 months

The sum of M&A exit value, secondaries, and the enterprise value of IPOs and SPACs is passed €750B in 2021.

stateof.ai 2021

�

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

€600B

€400B

€800B

€200B

€0B

�	��
�	�
	�

$19.7B acquisition�Apr 2021

$1.2B IPO�($8.9B valuation)�Jun 2021

$1.34B IPO�($35.8B valuation)�Apr 2021

$5.4B IPO

($150B valuation)�Feb 2021

$1.4B IPO�($8.5B valuation)�Apr 2021

$2B IPO�($28.6B valuation)�Apr 2021

▊ China

▊ USA

▊ European Union & UK

#stateofai | 148

150 of 188

Exits in AI: $2.3T of enterprise value has been created by AI companies since 2010

stateof.ai 2021

(*) Based on the exits with a known exit amount. The deals included the enterprise value of: Acquisitions, Secondaries, IPOs, SPAC IPOs, Buyouts;

Enterprise software, fintech, media, transportation, and food categories account for $2T of value creation.

Combined exit value

Number of exits

Avg. exit amount(*)

Enterprise software

Fintech

Media

Transportation

Food

Security

Health

Semiconductors

Marketing

Gaming

Telecom

Robotics

Travel

Real Estate

Music

Home living

Fashion

Energy

Jobs recruitment

Education

Legal

Sports

Dating

Hosting

Wellness beauty

Kids

Event tech

Enterprise software

Marketing

Fintech

Media

Security

Health

Transportation

Education

Robotics

Telecom

Semiconductors

Energy

Legal

Food

Real Estate

Jobs recruitment

Gaming

Travel

Home living

Music

Fashion

Sports

Event tech

Wellness beauty

Hosting

Kids

Dating

Semiconductors

Music

Travel

Fintech

Dating

Security

Real Estate

Jobs recruitment

Transportation

Telecom

Media

Food

Health

Home living

Enterprise software

Gaming

Robotics

Legal

Fashion

Marketing

Energy

Hosting

Education

Sports

Kids

Wellness beauty

Event tech

€487B

€472B

€417B

€390

€335B

€142B

€137B

€91B

€77B

€72B

€43B

€42B

€40B

€37B

€26B

€24B

€19B

€12B

€6B

€4B

€2B

€9B

€32M

€20M

€4B

€2B

€1B

€989M

€922M

€863M

€785M

€683M

€443M

€346M

€343M

€298M

€199M

€129M

€114M

€323M

€103M

€18M

686

245

209

182

166

145

110

72

66

63

55

51

46

41

39

31

21

18

17

16

11

9

8

4

2

#stateofai | 150

151 of 188

Exits in AI: almost 3.5-fold growth in AI-first SaaS enterprise value creation in 12 months

stateof.ai 2021

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

€300B

€200B

€400B

€100B

€0B

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

100

150

50

0

Combined enterprise value

Number of exits

#stateofai | 151

152 of 188

stateof.ai 2021

Exits in AI: corporates show growing interest in companies that are actively using AI

2019

2020

2021 YTD

2016

2017

2018

2015

2014

2012

2013

2011

2010

150

100

200

50

0

250

The number of exits⁽¹⁾ driven by corporates in 2021 exceeds 200, breaking all yearly records.

�	��
�	�
	�

£1.1B buyout�Sep 2021

$500M acquisition�Jul 2021

$290M acquisition

Jul 2021

$3.8B acquisition�Jul 2021

$575M acquisition�Jul 2021

$19.7B acquisition�Apr 2021

(1) Counted are acquisitions, buyouts and secondaries

#stateofai | 152

153 of 188

Section 4: Politics

stateof.ai 2021

#stateofai | 153

154 of 188

stateof.ai 2021

Dr Gebru left Google after a substantial disagreement over a research paper which examined the risks of large language models, including bias and the carbon footprint associated with training these models.

AI Ethics: Timnit Gebru’s firing from Google shocks the AI community

We highlighted the pioneering contribution Dr Gebru has made to the study of AI Ethics on slide 131 of last year’s State of AI Report. She built one of the most diverse teams in AI research while at Google.
Jeff Dean SVP Google AI stated that the research “didn’t meet our bar for publication” and that Gebru had said she would resign unless Google met a number of conditions. Dr Gebru stated she had been “fired by Jeff Dean”.
The event was a shock to the ML research community drawing substantial critique and a letter of protest signed by over 2500 Google employees.
Margaret Mitchell a Google AI ethics researcher was also suspended after downloading and sharing of company documents aimed at showing discriminatory treatment of Timnit Gebru.
Margaret Mitchell has subsequently been hired by open source AI champion Hugging Face.

#stateofai | 154

155 of 188

stateof.ai 2021

TAI is defined as “AI that has an impact comparable to that of the industrial revolution.” The model predicts a median of 2052 for the year in which some actor would be willing and able to train a single transformative model.

AI Safety: new quantitative model extrapolates from current research and compute trends to estimate when ‘transformative AI’ (TAI) might be possible

The author, Ajeya Cotra is a Senior Research Analyst at Open Philanthropy advised by leading researchers Dario Amodei (Anthropic) and Paul Christiano (Alignment Research Centre).
A core assumption is that if researchers are able to train a neural net or other ML model that uses about as much computation as a human brain, that will likely result in transformative AI.
The model then explores how as compute becomes cheaper and algorithms continue to become more efficient the likelihood of this threshold is met.

#stateofai | 155

156 of 188

AI Safety is defined by the authors as “the endeavour to ensure that AI is deployed in ways that do not harm humanity”. 68% of AI researchers surveyed think AI safety should be more prioritized than it is today, an increase from 49% found in a 2016 survey.
Amongst commercial actors, OpenAI, DeepMind, Google and Microsoft are perceived as most likely to shape the development of AI in the public interest.
Overall, they do not trust their government’s military. Most oppose or strongly oppose working on lethal autonomous weapons (73%).

stateof.ai 2021

A team from Cornell, Oxford and UPenn surveyed 524 researchers who published in top ML conferences and compared their views to that of the general public on subjects such as trust in international political and scientific organizations, military applications of AI, and more.

AI Safety: an overwhelming majority (68%) of machine learning researchers surveyed believe that AI Safety research should be prioritised more than at present

#stateofai | 156

157 of 188

stateof.ai 2021

Within AI Safety, AI Alignment is the critical field of research exploring how we can ensure that increasingly powerful AI systems have goals that are aligned with humanity. If transformational AI might happen in the next 30 years, are too few researchers actively focused on making sure it goes well for humanity?

AI Safety: fewer than 100 researchers work on AI Alignment in 7 leading AI organisations

DeepMind has the largest and most established AI Alignment team lead by co-founder and Chief Scientist, Shane Legg.
Cumulatively this is a tiny group - across 7 leading organisations less than 100 researchers are working on AI Alignment a tiny fraction of the AI Research community worldwide.

#stateofai | 157

Source: primary research by State of AI team. Note, these numbers are for long-term AI Alignment research, which does not include broader AI Safety focused on nearer-term issues. Blue = industry labs, red = academic labs.

158 of 188

stateof.ai 2021

As a percentage of total headcount, Anthropic (42%) and HCAI (36%) are investing the most in this area.

AI Safety: if transformative AI might happen in the next 30 years, how many people are working on making sure it goes well for humanity?

#stateofai | 158

Source: primary research by State of AI team. Note, these numbers are for long-term AI Alignment research, which does not include broader AI Safety focused on nearer-term issues. Blue = industry labs, red = academic labs.

159 of 188

stateof.ai 2021

AI Safety: new initiatives and organisations are cause for some optimism

Responding to the challenge, a number of smaller organisations and academic departments have sprung up led by talented researchers with an explicit focus on AI Alignment.

Paul Christiano who formerly ran the language model alignment team at OpenAI has created the Alignment Research Center.
Buck Schlegeris, formerly of MIRI, has started Redwood Research, a 10 person organisation focused on applied AI Alignment.
David Krueger, formerly of DeepMind, has become part of the faculty at the University of Cambridge, focused on AI Alignment.
Ought, is focused on ‘delegating open-ended thinking to advanced AI systems’ which naturally encompasses AI alignment

#stateofai | 159

160 of 188

stateof.ai 2021

DeepMind had been negotiating with Google to shift its legal structure to that of a non-profit and to establish a clear governance structure that tackles the deep oversight challenges associated with developing AGI.

AI Governance: DeepMind fails to gain independence from Google

DeepMind’s explicit mission is to create Artificial General Intelligence (AGI) and the founders have reportedly argued that “the powerful artificial intelligence they were researching shouldn’t be controlled by a single corporate entity”.
Google reportedly blocked DeepMind’s desire to shift its legal structure and this was announced to DeepMind employees
Furthermore, ethical oversight of DeepMind has shifted from an independent DeepMind ethics board to a general Google ethics board known as the “Advanced Technology Review Council”.

#stateofai | 160

161 of 188

stateof.ai 2021

Many of OpenAI’s leading researchers leave to start a major new AI research lab.

AI Governance: enter Anthropic as a potential third pole for AGI research

The new entity is lead by Dario Amodei, who was the most senior researcher at OpenAI. Dario pioneered OpenAI’s work on large language models including GPT-2 and GPT-3.
Many other core OpenAI team members also left to found or join Anthropic including Daniela Amodei (who was VP Safety), Tom Brown (who developed distributed training infrastructure that scaled from 1.5B parameters to 170B parameters), .Jack Clark (was OpenAI’s Policy Director), Chris Olah (led work on circuits and the discovery of multimodal neurons in CLIP), Sam McCandlish (OpenAI Research Lead), Tom Henighan (technical safety team), and Ben Mann (developed prototype of OpenAI API).
The company has raised $124M from a group of investors with a focus on AI Safety including Jaan Tallinn and Dustin Moskovitz.

#stateofai | 161

162 of 188

stateof.ai 2021

The team cites AI Safety and governance as a primary goal.

AI Governance: enter Anthropic as a potential third pole for AGI research

Anthropic explicitly defines itself as an “AI safety and research company”.
Anthropic will focus on research into increasing the safety of AI systems; specifically, the company is focusing on increasing the reliability of large-scale AI models, developing the techniques and tools to make them more interpretable, and building ways to more tightly integrate human feedback into the development and deployment of these systems.
The FT reports that “to insulate itself against commercial interference, Anthropic has registered as a public benefit corporation with special governance arrangements to protect its mission to ‘responsibly develop and maintain advanced AI for the benefit of humanity’. These include creating a long-term benefit committee made up of people who have no connection to the company or its backers, and who will have the final say on matters including the composition of its board.”

#stateofai | 162

163 of 188

stateof.ai 2021

A team of renegades have accomplished a huge amount since July 2020.

AI Governance: EleutherAI mounts an attempt to decentralise power via open source

Unlike GPT-3’s predecessors, GPT-2 and GPT-1, OpenAI did not open-source the model or training dataset, instead limiting access via a commercial API in partnership with Microsoft.
A group of committed open source and AI Safety focused people gathered on a Discord server in July 2020 to try to chart a new course: “We think that access to large, pretrained models will enable large swathes of research that would not have been possible while such technologies are locked away behind corporate walls. For-profit entities have explicit incentives to downplay risks and discourage security probing. We want to help the wider safety and security communities access and study these new technologies.”
They have made phenomenal progress, within 12 months releasing GPT-Neo, a 2.7B parameter model that outperforms one of the smaller GPT-3 models of a similar size.

#stateofai | 163

164 of 188

stateof.ai 2021

A notable achievement of the project has been to create The Pile, a free and publicly released 800GB dataset of diverse English text for large language modelling.

AI Governance: EleutherAI mounts attempt to decentralise power via open source

Eleuther’s noble aims have attracted support from the wider community attracting 10,000 members to their Discord server.
CoreWeave, a cloud service provider that specialises in high performance ML, contributed compute to train the GPT-Neo models. The EleutherAI team created a way to split AI computations across multiple machines.
The collective followed this with GPT-J-6B, a 6B parameter model for use with a new codebase, Mesh Transformer JAX.
The EleutherAI language models hosted on Hugging Face had over 500,000 downloads in August 2021.
The EleutherAI community has now expanded its activity and is working on open-source alternatives in BioML and generative art.

#stateofai | 164

165 of 188

stateof.ai 2021

BigScience, also known as the Summer of Language Models, is a one-year long workshop (started in May 2021) whose participants will create large multilingual LMs and datasets. Like EleutherAI, all the workshop’s outputs are open source, and the goal is to analyse the LMs and datasets from all scientific and societal aspects.

AI Governance: the BigScience workshop is an attempt at a hybrid alternative

Compared to EleutherAI’s 100% decentralized approach, BigScience is more structured. It is organised as a scientific workshop and is led by Hugging Face. As of September 2021, the workshop gathered 600 researchers from 50 countries and 250 institutions. BigScience takes inspiration from multi-institutional scientific collaboration schemes such as CERN.
The workshop is organized around several different subjects, with working groups (voluntarily) assigned to each task and periodic meetings. The subjects include the carbon footprint of training the LLMs, dataset creation, model training and evaluation. For each subject, a central focus is placed on studying the ethics, bias, fairness and multilinguality aspects.
Access to the French government’s supercomputer, Jean Zay, guarantees participants will have the necessary computational power to train a LLM.

#stateofai | 165

166 of 188

stateof.ai 2021

The EU introduced a proposal for AI regulation (AI Act) in April 2021. The proposal aims to provide the necessary legal certainty to facilitate innovation while ensuring the protection of consumer rights. Like GDPR, the proposed law concerns any person or organization, even foreign, involved with an AI system placed or used in the EU. But the AI Act goes beyond GDPR by aiming to directly regulate the use of AI systems.

The EU continues to be the first (and heavy handed) mover in AI regulation

The AI Act draws a distinction between three types of systems as a function of their “level of risk”: prohibited, high-risk and low-risk.
Prohibited AI practices include “subliminal techniques” that distort a person’s behavior, targeting of vulnerable groups, social scoring, and real-time remote biometric applications.
High-risk systems include those used as a safety component of larger systems, and those which can have an impact on fundamental rights. They include public infrastructure, social welfare, medical services, transportation systems, etc.
Low-risk AI systems are all AI systems that don’t fall in the above categories.

#stateofai | 166

167 of 188

stateof.ai 2021

The minimal requirements for all AI systems mainly concern explicitly informing users of the type of AI systems they are interacting with. For example, users need to be aware that the system does emotion recognition or biometric categorization, or that it is a deepfake.

While all AI systems need to satisfy some minimal requirements under the AIA, high-risk AI systems are subject to more scrutiny and accountability.

The AI Act: regulatory requirements in Europe

Furthermore, high-risk AI systems need to (i) be transparent: they should work as the user intended and their outputs should be interpretable, (ii) be secure: they should be robust and as accurate as advertised, (iii) contain all necessary technical documentation for proper use, and register logs of their behavior, (iv) have effective human oversight. They also need to conform to many other requirements pertaining to the risk management of the system.
Contrary to GDPR, complaints can only be made by supervisory authorities, not directly by individuals. The sanctions for failure to conform to the legislation can be as high as €30M or 6% of the company’s global annual turnover.

#stateofai | 167

168 of 188

stateof.ai 2021

Although the AIA is a step in the right direction, many feel the EU is rushing a legislation on technical issues even the scientific community doesn’t understand. As a result, it is not clear whether the EU and member states have the means to enforce it, nor that all companies have the means to comply with the legislation.

Regulating AI systems presents unique technical, economic and legal challenges

Technical: The fairness, interpretability and robustness of AI algorithms are still open research questions. With the current knowledge, high-risk systems evaluation will be flawed.
Economic: The EU commission estimated that annual compliance costs represent 17% of value of a reference AI unit. Assuming only 10% of the AI units will be subject to the regulatory requirements, the EU commission projects that the total compliance costs for the global AI industry will range between €1.6B and €3.3B in 2025.
Legal: The AIA’s risk-based categorization might be too coarse. The Progressive Policy Institute proposes to differentiate B2B and B2C systems, so that AI systems operating on a B2B level are subject to lower regulatory requirements than consumer facing ones.

#stateofai | 168

169 of 188

stateof.ai 2021

The Personal Information Protection Law (PIPL), China’s GDPR, will go into effect in November 2021. But Chinese regulators are moving fast. They are already proposing a legislation on a major subset of AI systems: recommendation algorithms. Chinese e-commerce giants and social networks, which are at the center of a regulatory crackdown, are heavy users of these systems.

In China, industrial policy and regulation go hand in hand

Although the proposed legislation shares some common points with the AIA, some of the rules clearly signal a desire to reduce the economic power of Chinese big tech companies.
Both EU and Chinese regulations appear in agreement with respect to consumer rights: the systems must not implicitly manipulate consumer behavior, and consumers need to be aware at anytime that they are interacting with the AI system.
But while the AIA leaves pricing practices outside of its scope, the Chinese draft explicitly requires that recommendation algorithms don’t cause “unreasonable” price differentiation enabled by consumer profile targeting.
Critics worry that direct regulation of algorithms opens the door for increased government scrutiny via access to proprietary company data and code.

#stateofai | 169

170 of 188

stateof.ai 2021

Chinese AI actors (government, academia, industry) have long been aware of AI ethics issues. In several papers and initiatives, they outlined principles for building ethical AI systems. But a practical application of these principles is still lacking, and AI ethics remain subordinated to higher political interests.

AI ethics in China: numerous initiatives, but to what end?

#stateofai | 170

171 of 188

stateof.ai 2021

The Chinese Governance Committee for the New Generation Artificial Intelligence published a draft with a set of ethical norms that AI systems should respect. While this is a step in the right direction, the government’s use of AI for censorship and surveillance jumps to mind as a major infringement of the introduced norms.

AI ethics in China: will a new draft on ethics norms change the status quo?

According to the ethics norms, AI systems should “promote fairness, justice, harmony, safety and security, and avoid issues such as prejudice, discrimination, privacy and information leakage.”
The norms apply to AI systems at all levels, from research and development to production, and are targeted at both system providers and users. If followed by action, these norms could help build more reliable and human-friendly AI systems.
But despite the displayed effort, it is hard to brush off the feeling that this initiative will still largely disregard the government's infringements of basic ethics rules through its use of AI for censorship and surveillance.

A first test of the ethics norms will be the fate reserved for patent applications from Huawei, Megvii and Sensetime on facial recognition systems which are able to recognize race and are specifically targeted at Uyghur populations.

#stateofai | 171

172 of 188

stateof.ai 2021

The Algorithmic Accountability Act was proposed, and ignored, in 2019. Since then, the US hasn’t seen any attempt at a comprehensive national AI regulation or consumer data privacy law.

Meanwhile, in the US, there still isn’t a federal law protecting consumer data privacy

Existing federal laws are fragmented and sectoral: they partially cover data in specific sensitive domains like health, credit and education, or data on vulnerable populations like children.
Some laws, like those on surveillance, are outdated and ill-suited to the modern internet. A GDPR-like bill, which covers all online consumer data, would make the legislation clearer for citizens and easier to maintain for legislators.
Meanwhile, the vast majority of data, including third party data, remains unregulated. Note that the US Constitution doesn’t provide for the right to privacy.

#stateofai | 172

173 of 188

stateof.ai 2021

Virginia signed the Virginia Consumer Data Protection Act (VCDPA) into law in March 2021. But an examination of the law shows it is not as binding as California’s CPRA, and largely means “business as usual” for Big Tech.

At the state level, comprehensive data privacy laws are rare and differ in strength

Only 3 states have passed comprehensive consumer privacy laws: California, Colorado and Virginia. But they do not have the same regulatory power.
California’s CPRA is the strongest: it allows for example for private action against data breaches and global opt out from data sharing at the device/browser level.
In contrast, Virginia’s VCDPA allows to opt out only at the individual website level and doesn’t allow for private action.

#stateofai | 173

174 of 188

stateof.ai 2021

The US Government Accountability Office (GAO), the supreme audit institution of the US federal government, examined the ownership and use of facial recognition technology by federal agencies, what activities it was used for, and whether agencies tracked how their employees used the technology.

20 of 42 US federal agencies own or use facial recognition systems for law enforcement

Agencies reported using facial recognition for criminal investigations and to verify a person’s identity remotely (due to Covid-19).
Six agencies processed images of the “unrest, riots, or protests following the death of George Floyd in May 2020”, while three agencies analysed images of the storming of the US Capitol on January 6, 2021.
The technology used by 14 agencies to support criminal investigations were owned by non-federal agencies and only one agency tracked by their employees used the system. This raises concerns over potential misuse of facial recognition.

#stateofai | 174

175 of 188

stateof.ai 2021

This is thought to be the first time a drone swarm has been used in combat.

Military AI moves into production: Israel uses AI guided drone swarm in Gaza attacks

Israel Defense Forces used swarms of drones controlled by a single operator, coordinating together using AI methods of unknown technical description. The military’s use of drones in this way was initially kept classified during the fighting, but has since been permitted to be published in part.
Israeli Military Intelligence declared the Gaza campaign the world’s “first AI war”, claiming that “for the first time, artificial intelligence represented a key factor and force-multiplier in warfare against an enemy”.

#stateofai | 175

176 of 188

stateof.ai 2021

This was the first time a U.S. Military System has been controlled by an AI system.

Military AI moves into production: US Air Force flew an AI copilot on a U-2 Spy Plane

The system, µZero (call sign ARTUµ) was a deep RL system derived from DeepMind’s work on games.
The Air Force stated that the system had completed a million simulated training runs prior to being used in production.
The Air Force stated that “U-2 gave ARTUµ complete radar control while “switching off” access to other subsystems, allows operators to choose what AI won’t do to accept the operational risk of what it will”.

#stateofai | 176

177 of 188

stateof.ai 2021

Military AI moves into production: US Air Force Research Lab tests autonomous Skyborg

Instead of replacing human pilots, Skyborg provides manned aircraft with situational awareness and survivability during combat missions.
In April 2021, the ARFL completed a 2 hour and 10 minute flight test which saw Skyborg perform a series of foundational autonomous flight behaviors. This included responding to navigational commands, reacting to geofences, and demonstrating coordinated maneuvering.
In the near future, this program aims to demonstrate “direct manned and unmanned teaming between aircraft and multiple ACS-controlled unmanned aircraft.”

The Skyborg Vanguard program is aimed at integrating “full-mission autonomy with low-cost, attritable unmanned air vehicle technology to enable manned-unmanned teaming.”

#stateofai | 177

178 of 188

stateof.ai 2021

Military AI: governments have doubled down on rhetoric and defense spending

“Today, the government is not organizing or investing to win the technology competition against a committed competitor, nor is it prepared to defend against AI-enabled threats”
Biden’s Pentagon Budget Request contains $874M of dedicated investment into AI.

“Our adversaries will gain a decisive advantage if we do not compete in a more concerted and urgent way in this technology [AI]. And secondly, opportunity: Investment in military AI – will be symbiotic with the growth of AI in other sectors”
Investment of £6.6B into military R&D over the next four years with special focus on AI and autonomous systems.

EU has fallen behind in military AI but is starting to catch up, moving into the next phase of developing the €100B Future Combat Air System (FCAS) – a trilateral cooperation between Germany, France and Spain with a significant role for AI – and launching the European Defense Fund with a budget of ca. €8.0B until 2027 and several AI related projects in its first wave.

#stateofai | 178

179 of 188

stateof.ai 2021

Anduril’s valuation doubles in 12 months to $4.6B after raising a $450M Series D. It has now raised circa $700M.

Military AI: Anduril continues to gain momentum

Anduril was awarded a $99M five-year Production Other Transaction (P-OT) Agreement by the Department of Defense (DOD).
Anduril will use Google Cloud as part of a contract with US Customs and Border Protection for its sentry towers along the US/Mexico border. This was revealed via a Freedom of Information Act request filed by a research group founded by a former research scientist at Google who leftover ethical concerns.
Anduril’s “virtual wall” aims to automate detection of migrants and traffickers along the southern border.

US-based competitors Shield AI (software to pilot unmanned military assets) and Rebellion Defense (AI software for the military) also raised significant capital in 2021 at or above $1B valuations: $210M Series D and >$150M Series B, respectively.

#stateofai | 179

180 of 188

stateof.ai 2021

Microsoft’s huge $22B contract for Hololens moves them closer to a defense prime.

Military AI: large tech companies scale up military contracts

The deal builds upon a $480M prototyping contract in 2018 and a $10B contract in 2019 for cloud services.
It sets Microsoft up to deliver 120,000 headsets and associated cloud services as part of an Integrated Visual Augmentation System.
Employees pushed back in an open letter arguing “We did not sign up to develop weapons, and we demand a say in how our work is used”, but the CEO defended the project saying “we made a principled decision that we’re not going to withhold technology from institutions that we have elected in democracies to protect the freedoms we enjoy.”

#stateofai | 180

181 of 188

stateof.ai 2021

In the face of slow adoption of AI legislation by the US Senate, legislators included some non-military AI provisions in the National Defense Authorization Act (NDAA), a bill which is all but guaranteed to pass every year.

Military AI: AI provisions are smuggled through military legislation

Stanford’s HAI summarized the AI provisions included in the NDAA. Part of these provisions did indeed concern military AI, including “acquiring ethically and responsibly developed artificial intelligence technology” and creating a steering committee tasked to develop a strategy on AI aimed at maintaining the technological superiority of the US.
The headline of the non-military AI provisions is the creation of the “National AI Initiative”, which will coordinate AI R&D among civilian agencies, the DoD and the Intelligence Committee.
A National AI advisory committee will also be created to advise the President on sensitive AI issues like bias and data security.
Remarkably, a new task force will write a plan for ownership of a National Research Cloud.
Finally, the provisions contain orders to the NSF to increase funding of AI research, with a specific focus on trustworthy AI and societal challenges.

#stateofai | 181

182 of 188

Section 5: Predictions

stateof.ai 2021

#stateofai | 182

183 of 188

stateof.ai 2021

8 predictions for the next 12 months

2. ASML’s market cap reaches $500B.

1. Transformers replace recurrent networks to learn world models with which RL agents surpass human

performance in large and rich game environments .

3. Anthropic publishes on the level of GPT, Dota, AlphaGo to establish itself as a third pole of AGI research.

4. A wave of consolidation in AI semiconductors with at least one of Graphcore, Cerebras, SambaNova, Groq, or

Mythic being acquired by a large technology company or major semiconductor incumbent.

5. Small transformers + CNN hybrid models match current SOTA on ImageNet top-1 accuracy (CoAtNet-7,

90.88%, 2.44B params) with 10x fewer parameters.

6. DeepMind releases a major research breakthrough in the physical sciences.

7. The JAX framework grows from 1% to 5% of monthly repos created as measured by PapersWithCode.

#stateofai | 183

8. A new AGI-focused research company is formed with significant backing and a roadmap that’s focused on a

sector vertical (e.g. developer tools, life science).

184 of 188

Section 6: Conclusion

stateof.ai 2021

#stateofai | 184

185 of 188

Thanks!

Congratulations on making it to the end of the State of AI Report 2021! Thanks for reading.

In this report, we set out to capture a snapshot of the exponential progress in the field of artificial intelligence, with a focus on developments since last year’s issue that was published on 1st October 2020. We believe that AI will be a force multiplier on technological progress in our world, and that wider understanding of the field is critical if we are to navigate such a huge transition.

We set out to compile a snapshot of all the things that caught our attention in the last year across the range of AI research, talent, industry and the emerging politics of AI.

We would appreciate any and all feedback on how we could improve this Report further, as well as contribution suggestions for next year’s edition.

Thanks again for reading!

Nathan Benaich (@nathanbenaich) and Ian Hogarth (@soundboy)

stateof.ai 2021

#stateofai | 185

186 of 188

The authors declare a number of conflicts of interest as a result of being investors and/or advisors, personally or via funds, in a number of private and public companies whose work is cited in this report.

Ian is an investor in: Anthropic, ClipDrop, Faculty AI, LabGenius.

Conflicts of interest

stateof.ai 2021

#stateofai | 186

187 of 188

About the authors

Nathan is the General Partner of Air Street Capital, a venture capital firm investing in AI-first technology and life science companies. He founded RAAIS and London.AI (AI community for industry and research), the RAAIS Foundation (funding open-source AI projects), and Spinout.fyi (improving university spinout creation). He studied biology at Williams College and earned a PhD from Cambridge in cancer research.

Nathan Benaich

Ian Hogarth

Ian is an angel investor in 100+ startups. He is a Visiting Professor at UCL working with Professor Mariana Mazzucato. Ian was co-founder and CEO of Songkick, the concert service. He studied engineering at Cambridge where his Masters project was a computer vision system to classify breast cancer biopsy images. He is the Chair of Phasecraft, a quantum software company.

stateof.ai 2021

#stateofai | 2

188 of 188

State of AI Report

October 12, 2021

#stateofai

stateof.ai

Ian Hogarth

Nathan Benaich