State of AI Report
October 1, 2020
#stateofai
stateof.ai
Ian Hogarth
Nathan Benaich
About the authors
Nathan is the General Partner of Air Street Capital, a venture capital firm investing in AI-first technology and life science companies. He founded RAAIS and London.AI, which connect AI practitioners from large companies, startups and academia, and the RAAIS Foundation that funds open-source AI projects. He studied biology at Williams College and earned a PhD from Cambridge in cancer research.
Nathan Benaich
Ian Hogarth
Ian is an angel investor in 60+ startups. He is a Visiting Professor at UCL working with Professor Mariana Mazzucato. Ian was co-founder and CEO of Songkick, the concert service used by 17M music fans each month. He studied engineering at Cambridge where his Masters project was a computer vision system to classify breast cancer biopsy images. He is the Chair of Phasecraft, a quantum software company.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its third year. New to the 2020 edition are several invited content contributions from a range of well-known and up-and-coming companies and research groups. Consider this Report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Collaboratively produced by Ian Hogarth (@soundboy) and Nathan Benaich (@nathanbenaich).
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Thank you to our contributors
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Thank you to our reviewers
Jack Clark, Jeff Ding, Chip Huyen, Rebecca Kagan, Andrej Karpathy, Moritz Müller-Freitag, Torsten Reil, Charlotte Stix, and Nu (Claire) Wang.
Artificial intelligence (AI): A broad discipline with the goal of creating intelligent machines, as opposed to the natural intelligence that is demonstrated by humans and animals. It has become a somewhat catch all term that nonetheless captures the long term ambition of the field to build machines that emulate and then exceed the full range of human cognition.
Machine learning (ML): A subset of AI that often uses statistical techniques to give machines the ability to "learn" from data without being explicitly given the instructions for how to do so. This process is known as “training” a “model” using a learning “algorithm” that progressively improves model performance on a specific task.
Reinforcement learning (RL): An area of ML concerned with developing software agents that learn goal-oriented behavior by trial and error in an environment that provides rewards or penalties in response to the agent’s actions (called a “policy”) towards achieving that goal.
Deep learning (DL): An area of ML that attempts to mimic the activity in layers of neurons in the brain to learn how to recognise complex patterns in data. The “deep” in deep learning refers to the large number of layers of neurons in contemporary ML models that help to learn rich representations of data to achieve better performance gains.
Definitions
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Algorithm: An unambiguous specification of how to solve a particular problem.
Model: Once a ML algorithm has been trained on data, the output of the process is known as the model. This can then be used to make predictions.
Supervised learning: A model attempts to learn to transform one kind of data into another kind of data using labelled examples. This is the most common kind of ML algorithm today.
Unsupervised learning: A model attempts to learn a dataset's structure, often seeking to identify latent groupings in the data without any explicit labels. The output of unsupervised learning often makes for good inputs to a supervised learning algorithm at a later point.
Transfer learning: An approach to modelling that uses knowledge gained in one problem to bootstrap a different or related problem, thereby reducing the need for significant additional training data and compute.
Natural language processing (NLP): Enabling machines to analyse, understand and manipulate language.
Computer vision: Enabling machines to analyse, understand and manipulate images and video.
Definitions
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Research
Talent
Industry
Politics
Executive Summary
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Scorecard: Reviewing our predictions from 2019
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Our 2019 Prediction | Grade | Evidence |
New natural language processing companies raise $100M in 12 months. | Yes | Gong.io ($200M), Chorus.ai ($45M), Ironscales ($23M), ComplyAdvantage ($50M), Rasa ($26M), HyperScience ($60M), ASAPP ($185M), Cresta ($21M), Eigen ($37M), K Health ($48M), Signal ($25M), and many more! |
No autonomous driving company drives >15M miles in 2019. | Yes | Waymo (1.45M miles), Cruise (831k miles), Baidu (108k miles). |
Privacy-preserving ML adopted by a F2000 company other than GAFAM (Google, Apple, Facebook, Amazon, Microsoft). | Yes | Machine learning ledger orchestration for drug discovery (MELLODY) research consortium with large pharmaceutical companies and startups including Glaxosmithkline, Merck and Novartis. |
Unis build de novo undergrad AI degrees. | Yes | CMU graduates first cohort of AI undergrads, Singapore’s SUTD launches undergrad degree in design and AI, NYU launches data science major, Abu Dhabi builds an AI university. |
Google has major quantum breakthrough and 5 new startups focused on quantum ML are formed. | Sort of | Google demonstrated quantum supremacy in October 2019! Many new quantum startups were launched in 2019 but only Cambridge Quantum, Rahko, Xanadu.ai, and QCWare are explicitly working on quantum ML. |
Governance of AI becomes key issue and one major AI company makes substantial governance model change. | No | Nope, business as usual. |
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Section 1: Research
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Code Availability
Paper Publication Date
2017 2018 2019 2020
25%
20%
15%
10%
0%
5%
Research paper code implementations are important for accountability, reproducibility and driving progress in AI.
The field has made little improvement on this metric since mid-2016. Traditionally, academic groups are more
likely to publish their code than industry groups. Notable organisation that don’t publish all of their code are
OpenAI and DeepMind. For the biggest tech companies, their code is usually intertwined with proprietary scaling
infrastructure that cannot be released. This points to centralization of AI talent and compute as a huge problem.
AI research is less open than you think: Only 15% of papers publish their code
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Hosting 3,000 State-of-the-Art leaderboards, 750+ ML components, and 25,000+ research along with code.
Papers With Code tracks openly-published code and benchmarks model performance
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
% PyTorch Papers of Total TensorFlow/PyTorch Papers
% of total framework mentions
100%
75%
50%
25%
0%
Of 20-35% of conference papers that mention the framework they use, 75% cite the use of PyTorch but not
TensorFlow. Of 161 authors who published more TensorFlow papers than PyTorch papers in 2018, 55% of them
have switched to PyTorch. The opposite happened in 15% of cases. Meanwhile, we observe that TensorFlow,
Caffe and Caffe2 are still the workhorse for production AI.
Facebook’s PyTorch is fast outpacing Google’s TensorFlow in research papers, which tends to be a leading indicator of production use down the line
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
47% of these implementations are based on PyTorch vs. 18% for TensorFlow. PyTorch offers greater flexibility
and a dynamic computational graph that makes experimentation easier. JAX is a Google framework that is more
math friendly and favored for work outside of convolutional models and transformers.
PyTorch is also more popular than TensorFlow in paper implementations on GitHub
stateof.ai 2020
Repository Creation Date
Share of implementations
100%
75%
50%
25%
0%
2017 2018 2019 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Huge models, large companies and massive training costs dominate the hottest area of AI today, NLP.
Language models: Welcome to the Billion Parameter club
2018 (left) through 2019 (right)
2020 onwards
11B
175B
9.4B
17B
1.5B
8.3B
2.6B
1.5B
66M
355M
340M
330M
665M
465M
340M
110M
94M
1.5B
stateof.ai 2020
Note: The number of parameters indicates how many different coefficients the algorithm optimizes during the training process.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Empirical scaling laws of neural language models show smooth power-law relationships, which means that as
model performance increases, the model size and amount of computation has to increase more rapidly.
Bigger models, datasets and compute budgets clearly drive performance
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Tuning billions of model parameters costs millions of dollars
Based on variables released by Google et al., you’re paying circa $1 per 1,000 parameters. This means OpenAI’s
175B parameter GPT-3 could have cost tens of millions to train. Experts suggest the likely budget was $10M.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
This transformer-based model with conditional computation for machine translation has 600B parameters.
To achieve the needed quality improvements in machine translation, Google’s final model trained for the equivalent of 22 TPU v3 core years or ~5 days with 2,048 cores non-stop
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Without major new research breakthroughs, dropping the ImageNet error rate from 11.5% to 1% would require
over one hundred billion billion dollars! Many practitioners feel that progress in mature areas of ML is stagnant.
We’re rapidly approaching outrageous computational, economic, and environmental costs to gain incrementally smaller improvements in model performance
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
This has implications for problems where training data samples are expensive to generate, which likely confers
an advantage to large companies entering new domains with supervised learning-based models.
A larger model needs less data than a smaller peer to achieve the same performance
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Google made use of their large language models to deliver higher quality translations for languages with limited
amounts of training data, for example Hansa and Uzbek. This highlights the benefits of transfer learning.
Low resource languages with limited training data are a beneficiary of large models
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Since 2012 the amount of compute needed to train a neural network to the same performance on ImageNet
classification has been decreasing by a factor of 2 every 16 months.
Even as deep learning consumes more data, it continues to get more efficient
Training efficiency factor
Two distinct eras of compute in training AI systems
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
PolyAI, a London-based conversational AI company, open-sourced their ConveRT model (a pre-trained contextual
re-ranker based on transformers). Their model outperforms Google’s BERT model in conversational applications,
especially in low data regimes, suggesting BERT is far from a silver bullet for all NLP tasks.
Yet, for some use cases like dialogue small, data-efficient models can trump large models
Model | 1-vs-100 Accuracy | Model Size |
ELMo | 20.6% | 372M |
BERT | 24.0% | 1.3G |
USE | 47.7% | 845M |
ConveRT (PolyAI) | 68.2% | 59M |
stateof.ai 2020
Amount of data
60%
70%
80%
90%
Low
High
# of data points
100
80
60
40
20
64
1024
8198
F1 Score
Intent Accuracy
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
A new generation of transformer language models are unlocking new NLP use-cases
stateof.ai 2020
GPT-3, T5, BART are driving a drastic improvement in the performance of transformer models for text-to-text
tasks like translation, summarization, text generation, text to code.
Summarization from huggingface.co/models
Code generation and more: gpt3examples.com
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
An unsupervised machine translation model trained on GitHub projects with 1,000 parallel functions can
translate 90% of these functions from C++ to Java and 57% of Python functions into C++ and successfully pass
unit tests. No expert knowledge required, but no guarantees that the model didn’t memorize the functions either.
Computer, please convert my code into another programming language
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Given a broken program and diagnostic feedback (compiler error message), DrRepair localizes an erroneous
line and generates a repaired line.
Computer, can you automatically repair my buggy programs too?
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
It was only 12 months ago that the human GLUE benchmark was beat by 1 point. Now SuperGLUE is in sight.
NLP benchmarks take a beating: Over a dozen teams outrank the human GLUE baseline
Human baseline = 87
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
A multi-task language understanding challenge tests for world knowledge and problem solving ability across 57
tasks including maths, US history, law and more. GPT-3’s performance is lopsided with large knowledge gaps.
What’s next after SuperGLUE? More challenging NLP benchmarks zero-in on knowledge
stateof.ai 2020
Figure note: “Small” (2.7B parameters), “Medium” (6.7B), “Large” (13B) and “X-Large” (175B).
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
For example, GPT-2 was trained on text but can be fed images in the form of a sequence of pixels to learn how to
autocomplete images in an unsupervised manner.
The transformer’s ability to generalise is remarkable. It can be thought of as a new layer type that is more powerful than convolutions because it can process sets of inputs and fuse information more globally.
stateof.ai 2020
Completions
Input
Original
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Biology is experiencing its “AI moment”: Over 21,000 papers in 2020 alone
stateof.ai 2020
Publications involving AI methods (e.g. deep learning, NLP, computer vision, RL) in biology are growing
>50% year-on-year since 2017. Papers published since 2019 account for 25% of all output since 2000.
2020 annualized
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
From physical object recognition to “cell painting”: Decoding biology through images
>14M labeled images
RxRx.ai image datasets of cells treated with various chemical agents
stateof.ai 2020
Large labelled datasets offer huge potential for generating new biological knowledge about health and disease.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Embeddings from experimental data illuminate biological relationships and predict COVID-19 drug successes.
Deep learning on cellular microscopy accelerates biological discovery with drug screens
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
After diagnosis of ‘wet’ age-related macular degeneration (exAMD) in one eye, a computer vision system can
predict whether a patient’s second eye will convert from healthy to exAMD within six months. The system uses
3D eye scans and predicted semantic segmentation maps.
Ophthalmology advances as the sandbox for deep learning applied to medical imaging
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The AI system, an ensemble of three deep learning models operating on individual lesions, individual breasts
and the full case, was trained to produce a cancer risk score between 0 and 1 for the entire mammography case.
The system outperformed human radiologists and could generalise to US data when trained on UK data only.
AI-based screening mammography reduces false positives and false negatives in two large, clinically-representative datasets from the US and UK
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Most ML applications utilise statistical techniques to explore correlations between variables. This requires that
experimental conditions remain the same and that the trained ML system is applied on the same kind of data as
the training data. This ignores a major component of how humans learn - by reasoning about cause and effect.
Causal Inference: Taking ML beyond correlation
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Existing AI approaches to diagnosis are purely associative, identifying diseases that are strongly correlated with
a patient’s symptoms. The inability to disentangle correlation from causation can result in suboptimal or
dangerous diagnoses.
Causal reasoning is a vital missing ingredient for applying AI to medical diagnosis
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
A flaw with Shapley values, one current approach to explainability, is that they assume the model’s input
features are uncorrelated. Asymmetric Shapley Values (ASV) are proposed to incorporate this causal information.
Model explainability is an important area of AI safety: A new approach aims to incorporate causal structure between input features into model explanations
Explaining income classifier on Adult Census data set
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
RL agent designs molecules using step-wise transitions defined by chemical reaction templates.
Reinforcement learning helps ensure that molecules you discover in silico can actually be synthesized in the lab. This helps chemists avoid dead ends during drug discovery.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Have your desired molecule? ML will generate a synthesis plan faster than you can
stateof.ai 2020
Repurposing the transformer architecture by treating chemistry as a machine translation problem unlocks
efficient chemical synthesis planning to accelerate drug discovery workflows.
Test set accuracy for chemical synthesis plans (%)
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Most deep learning methods focus on learning from 2D input data (i.e. Euclidean space). Graph neural networks
(GNNs) are an emerging family of methods that are designed to process 3D data (i.e. non-Euclidean space).
Graph neural networks: Solving problems by making use of 3D input data
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
A graph neural network was trained on empirical data of molecules and their binary antibiotic toxicity. This
model then virtually screened millions of potentially antibiotic compounds to find a structurally different
antibiotic, halicin, with broad-spectrum activities in mice.
Graph networks learn to guide antibiotic drug screening, leading to new drugs in vivo
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Principal Neighborhood Aggregation combines different aggregators and scalers to improve graph-based
chemical property prediction.
Enhancing chemical property prediction using graph neural networks
stateof.ai 2020
Log of mean-squared error on graph property prediction (lower is better)
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
DELs are composed of millions to billions of small molecules with unique DNA tags attached, which can be seen
as building blocks for larger molecules. By training a GNN on binding affinity between drugs and a target,
researchers can find hits to three drug targets from ∼88 M synthesizable or inexpensive purchasable compounds.
AI sifts through chemical space using DNA-encoded small molecule libraries (DEL)
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Proteins are biological molecules that can be described as crystal structures (167k available today) or their
amino acid (AA) sequences (24 million available today). Similar to the process of learning word vectors, this work
shows that AA sequence representations learned by an RNN can predict a variety of structural and functional
properties for diverse proteins.
Language models show promise in learning to predict protein properties from amino acid sequences alone
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The COVID Symptom Study app collects and analyzes the health of over 4 million global contributors to discover
new symptoms, predict COVID hotspots and using AI, eventually predict COVID-19 without a physical test. ZOE is
running the world’s largest clinical study to validate the prediction model.
COVID-19: Analyzing symptoms from over 4 million contributors detects novel disease symptom ahead of public health community and could inform diagnosis without tests
stateof.ai 2020
Delirium
Fever
Loss of smell
Skipped meals
Shortness of breath
Abdominal pain
Chest pain
Hoarse voice
Fatigue
Persistent cough
Diarrhea
Odds ratio
Specificity
Specificity
Sensitivity
Sensitivity
Loss of smell is the most predictive symptom of COVID-19
ROC predictions for risk of a positive test in the
UK test set (b) and US validation set (c)
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Drug discovery goes open source to tackle COVID-19. This is a rare example of where AI is being actively used on a clearly-defined problem that’s part of the COVID-19 response.
stateof.ai 2020
An international team of scientists are working pro-bono, with no IP claims, to crowdsource a COVID antiviral.
Learn more: postera.ai/covid
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Missed out on strawberries and cream this year? A controllable synthetic video version of Wimbledon tennis matches
stateof.ai 2020
Combining a model of player and tennis ball trajectories, pose estimation, and unpaired image-to-image
translation to create a realistic controllable tennis match video between any players you wish!
For more examples, head to cs.stanford.edu/~haotianz/research/vid2player/
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
sequence, and uses transformers to model pairwise
interactions between the features.
is simpler because it drops multiple hand-designed priors
and its attention decoder helps with interpretability.
A transformer-based object detection model matches the performance of the best object detection models while
removing hand-coded prior knowledge and using half the compute budget.
Attention turns to computer vision tasks like object detection and segmentation
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Footprints: A method for estimating the visible and hidden traversable space from a single RGB image.
Computer vision predicts where an agent can walk beyond what is seen
stateof.ai 2020
path planning for robots or augmented
reality agents.
where it can walk or roll, beyond the
immediately visible surfaces. This enables
virtual characters to more realistically explore
their environments in AR applications.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Computer vision learns stereo from single images
stateof.ai 2020
Training state-of-the-art stereo matching networks on a collection of single images.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Enabling the use of consumer-grade 360° cameras in construction using deep learning
State-of-the-art geometry-guided deep learning method for levelling misaligned 360° images.
stateof.ai 2020
Misaligned 360° image resulting in heavy distortions (left) and correctly levelled result (right)
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Learning dynamic behaviors through latent imagination
stateof.ai 2020
Dreamer is an RL agent that solves long-horizon tasks from images purely through an imagined world.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Predicting how a given driving situation will unfold, ranging from what the driver will do and the behavior of
dynamic agents in the scene, can help an autonomous agent to learn how to drive from videos.
Learning to drive by predicting and reasoning about the future
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Visual Question Answering about everyday images
stateof.ai 2020
Look, Read, Reason & Answer (LoRRA), a novel model for answering questions based on text in images.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Learning a multi-purpose generative model from a single natural image
stateof.ai 2020
SinGAN is an unconditional generative scheme that generates diverse realistic samples beyond textures.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
EfficientDet-D7 achieves state-of-the-art on COCO object detection task with 4-9x fewer model parameters than
the best-in-class and can run 2-4x faster on GPUs and 5-11x faster on CPUs than other detectors.
On-device computer vision models that won’t drain your battery
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
To date, AutoML runs neural architecture search by combining complex, handwritten building blocks. Preliminary
work on a simplified image classification problem shows how to remove this human bias by using evolutionary
methods to automatically find the code for complete ML algorithms.
Evolving entire algorithms from basic mathematical operations alone with AutoML-Zero
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Almost 5x growth in the number of papers that mention federated learning from 2018 to 2019. More papers
have been published in the first half of 2020 than in all of 2019.
Kicked off by Google in 2016, federated learning research is now booming
stateof.ai 2020
57%
965
1050
180
65
7
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
OpenMined, the leading open-source community for privacy-preserving ML, demonstrates the first open-source federated learning platform for web, mobile, server, and IoT
stateof.ai 2020
This enables the training of arbitrary neural models on private data living on a web browser or mobile device.
Android
iOS
Javascript
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Prospective testing begins for privacy-preserving AI applied to medical imaging
stateof.ai 2020
While the pooling of medical data should lead to improved medical knowledge and clinical care, it is also an area
with strong safeguards around privacy. New techniques enable privacy-preserving innovation.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Fig A
Fig B
Gaussian Processes (GPs) Strike Back: Quantified uncertainty and faster training speed
GPs are becoming more accurate and faster to train, whilst retaining their favourable properties, like calibrated
uncertainty, making them more relevant for real-world applications today.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
In our 2019 Predictions we predicted: “Google has a major breakthrough in quantum computing hardware,
triggering the formation of at least 5 new startups trying to do quantum machine learning.”
2019 Prediction outcome: Google quantum supremacy
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Section 2: Talent
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Google, DeepMind, Amazon, Microsoft have hired 52 tenured and tenure-track professors from US Universities
between 2004 and 2018. Carnegie Mellon, U. Washington and Berkeley have lost 38 professors during the same
period. Note that no AI professor left in 2004, whereas 41 AI professors left in 2018 alone.
The Great Brain Drain: AI professors depart US universities for technology companies
stateof.ai 2020
Wikipedia
Figure note: Graphs include AI professors who completely left academia or reported a dual industry and academic affiliation
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
New professorships may free the ladder for young academic talent to rise. Meanwhile, some companies including
Facebook champion the dual academic/industry affiliation as the solution. Some academics don’t buy it.
Tech companies endow AI professorships in return for poaching, but is this really enough?
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
4-6 years after the departure of tenured AI professors, graduates are 4% less likely to start an AI company (a).
The loss of AI professors seems to matter: Departures correlate with reduced graduate entrepreneurship across 69 US universities
stateof.ai 2020
Wikipedia
c
a
b
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The Eindhoven Artificial Intelligence Systems Institute in The Netherlands plans to recruit 50 professors (!)
Can €100M buy you 50 professors for a new AI Institute?
stateof.ai 2020
Wikipedia
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
11 corporate partners joined, including The Jackson Laboratory (a major non-profit biomedical research center).
A $100M donation from Silver Lake founder to mint the Roux Institute at Northeastern University: New graduate degrees that focus on AI applied to the digital and life sciences
stateof.ai 2020
Wikipedia
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
In our 2019 Predictions we predicted: “Institutes of higher education establish purpose-built AI degrees to fill
talent void.” Mohamed bin Zayed University of AI (MBUZAI) is a new research-based institute of higher education.
2019 Prediction outcome: Abu Dhabi opens the “World’s first AI University”
stateof.ai 2020
Wikipedia
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Chinese-educated researchers make increasingly significant contributions at NeurIPS
stateof.ai 2020
29% of authors with papers accepted at NeurIPS 2019 earned their undergraduate degree in China.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
But after leaving university in China, 54% of graduates who go on to publish at NeurIPS move to the USA
stateof.ai 2020
The US attracts over half of foreign NeurIPS 2019 authors by the time they finish undergrad.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The US is an incredibly strong talent retainer post-PhD
stateof.ai 2020
10%
88%
15%
85%
Almost 90% of Chinese and non-Chinese students who earn an American PhD are retained in the US for work.
Chinese PhD students
International PhD students
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Foreign national graduates of US AI PhD programs are most likely to end up in large companies whereas American nationals are more likely to end up in startups or academia
stateof.ai 2020
Foreign nationals are 2x more likely to join large companies, in part due to their H1B sponsoring power.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The UK and China are the biggest beneficiaries of American-educated AI PhDs who leave the US after graduation
stateof.ai 2020
55% of graduates moving to the UK take private sector jobs; 40% of those who move to China do the same.
Destination countries of AI PhD students who leave the US post-graduation
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The majority of top AI researchers working in the US were not trained in America
stateof.ai 2020
China (27%), Europe (11%), and India (11%) are the largest feeder nations for US institutions.
Country from which an individual earned their undergrad
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Given how dependent America’s AI industry is on immigrants there has been a strong backlash to Trump’s proclamation to suspend H1-B visas. Eight federal lawsuits and hundreds of universities object.
stateof.ai 2020
President Trump suspended the entry of aliens into the US during COVID-19 and then retreated. Note that 92%
of top international US AI PhD graduates work in the US post-graduation and 80% intend to stay if they can.
Foreign students working in the US post-graduation
Post-graduation intent for foreign AI students at US graduate schools
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
American institutions and corporations continue to dominate NeurIPS 2019 papers
Google, Stanford, CMU, MIT and Microsoft Research own the Top-5.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The top 20 most prolific organisations by ICML 2020 paper acceptances further cemented their position vs.
ICML 2019. The chart below shows their Publication Index position gains vs. ICML 2019.
The same is true at ICML 2020: American organisations cement their leadership position
stateof.ai 2020
Credit: Gleb Chuvpilo
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Stanford now teaches 10x the students per year as during 1999–2004, and twice as many as 2012–2014.
Leading Universities continue to expand AI course enrollment
stateof.ai 2020
Stanford NLP class enrollment
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Analysis of Indeed.com US data shows almost 3x more job postings than job views for AI-related roles. Job
postings grew 12x faster than job viewings in the last from late 2016 to late 2018.
Demand outstrips supply for AI talent
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Public job postings on LinkedIn that mention a deep learning framework were on a strong 2020 ramp up but
took a hit due to COVID-19 since February 2020
While hot, the AI talent market is not immune to the COVID-19 pandemic
stateof.ai 2020
Credit: François Chollet
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Section 3: Industry
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
The first phase 1 clinical trial of an AI-designed drug begins in Japan to treat OCD patients
stateof.ai 2020
The result of a 12 month collaboration between British Exscientia and Japanese Sumitomo Dainippon Pharma.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Emerging evidence that large pharma is validating AI-first therapeutic discovery outputs
stateof.ai 2020
29 months after signing a €250M deal to evaluate over one thousand combinations of immunological drug
targets for potential synergistic effects with bispecific small molecules, Exscientia discovers a novel,
first-in-class small molecule that Sanofi will now progress.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
AI-first drug discovery startups raising mega rounds and fulfilling their “platform strategies”
Platform technologies give rise to promising drug assets that are spun off into independent entities following
the successful “asset focused” company building approach of the life science sector and its investors.
stateof.ai 2020
$121M Series C
July 2019
$60M Series C
May 2020
$143M Series B
May 2020
$56M Series C
October 2019
$123M Series B
August 2020
Owned spinoff
Spinoff with
$14.5M Series A
November 2019
Endodermal cancers
November 2019
Rare brain cancers
$239M Series D
September 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
2019 Prediction outcome: Large pharma and startups ally around privacy-preserving machine learning for drug discovery
stateof.ai 2020
In our 2019 Predictions we predicted: “Privacy-preserving ML techniques are adopted by a non-GAFAM
Fortune 2000 company”. Project MELLODY is the machine learning ledger orchestration for drug discovery.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Deep learning models interpret protein biology to find new therapeutics
stateof.ai 2020
Combining ML with carefully designed experiments has enabled LabGenius to increase the number of potential
drug candidates by up to 100,000 fold.
1.8
1.6
1.4
1.2
1.0
0.8
0.6
Predicted binding affinity scores
2.00
1.75
1.50
1.25
1.00
0.75
0.50
True binding affinity scores
Test set, Spearman r = 0.88
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Deep learning revamps super-resolution microscopy imaging from acquisition to analysis
stateof.ai 2020
Collapsing hours of human microscope time to minutes using supervised learning and computer vision.
Propagation of acquisition meta-data and annotations through pipeline
Real-time super-resolution processing
Interactive visualisation
Intuitive analysis software
High-throughput quantification
Input
Automatically locates new examples
Minimal user
annotations
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Identical twins have very different responses to the same foods. ML predictions of glucose, triglyceride response
two hours after meal consumption correlate 77% of the time with actual measured responses. ZOE’s commercial
AI-driven test kit launched in the US in August 2020.
Using genetic, metabolomic, metagenomic and meal-context information from 1,100 study participants to predict individuals’ metabolic response to food at scale
stateof.ai 2020
Glucose
Glucose (mmol/L)
Triglycerides
TAG (mmol/L)
Humans show lots of variability in their response to the same meal
Time post-meal consumption (hours)
ZOE’s model predicts responses
to new meals from test results
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
In 2019, the FDA acknowledged that the traditional paradigm of medical device regulation was not designed for AI-first software which improves over time.
stateof.ai 2020
The typically FDA approved AI-first software as medical device (SaMD) products are “locked”. The FDA published
a new proposal to embrace the highly iterative and adaptive nature of AI systems in what they call a “total
product lifecycle (TPLC) regulatory approach built on good machine learning practices.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
AI-based medical imaging studies have a major problem. A review of 20,000 recent studies in the field found that
less than 1% of these studies had sufficiently high-quality design and reporting. Studies suffer from the lack of
external validation by independent research groups, generalizability to new datasets, and dubious data quality.
New international guidelines are drafted for clinical trial protocols (SPIRIT-AI) and reports (CONSORT-AI) that involve AI systems in a bid to improve both quality and transparency
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Viz.ai was granted a New Technology Add on Payment of up to $1,040 per use in patients with suspected strokes.
The AI system scans computed tomography scans of the brain and pings the results to a specialist who can treat
the patient before they suffer damage that leads to long-term disability. Several exclusion factor apply...
The first reimbursement approval for a deep learning-based medical imaging product has been granted by the Centers for Medicare and Medicaid Services (CMS) in the USA
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
US states continue to legislate autonomous vehicles policies
stateof.ai 2020
Over half of all US states have enacted legislation to related to autonomous vehicles.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Even so, driverless cars are still not so driverless: Only 3 of 66 companies with AV testing permits in California are allowed to test without safety drivers since 2018
stateof.ai 2020
To qualify for a driverless testing permit, companies must show proof of insurance or a bond equal to $5 million,
prove that their cars can operate without a driver, meet the federal Motor Vehicle Safety Standards or otherwise
have an exemption from the National Highway Traffic Safety Administration.
30 October 2018
7 April 2020
17 July 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Self-driving mileage in California remains microscopic compared to human driving
stateof.ai 2020
Self-driving car companies racked up 42% more AV miles in 2019 than 2018. However, this only equates to
0.000737% of the miles driven by licensed California drivers 2019.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Despite performing in the bottom 5th percentile of miles/disengagement in 2018, Baidu claims to have driven
18,050 miles/dis., which puts it at the top of the leaderboard ahead of Waymo with 13,219 miles/dis.
(vs. 11,154 miles/dis. in 2018). A year-on-year improvement of 8,679% sounds too good to be true...
Sketchy metrics: Tracking AV progress is complicated by the industry’s focus on miles per disengagement, which is hard to benchmark and is not reported across all US States.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Consolidation of industry players begins as Zoox, the company reinventing an AV-first car, was acquired by
Amazon for a reported $1.3B in cash. The company raised at least $955M since 2015 with its last reported
post-money valuation of $3.2B.
Self-driving: When even a billion dollars isn’t enough
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The main self-driving contenders raised almost $7B in private rounds since July 2019
stateof.ai 2020
$2.6B led by VW Group
July 2019
$31M led by Franklin Templeton
Sept 2019
$100M led by Dongfeng Motors
Sept 2019
$462M led by Toyota
Feb 2020
$41M led by Trustbridge
March 2020
$500M led by SoftBank Vision Fund
May 2020
$3B led by Silver Lake
March 2020
$20M led by Lead Ventures
March 2020
2019
2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Another self-driving group, DiDi, spins off from its parent and raises $500M
stateof.ai 2020
#AIreport
DiDi’s self-driving unit raised >$500M from SoftBank Vision Fund, grew its team from 200 to >400 since last year
and launched its ride hailing service to consumers in Shanghai from late July 2020. The service runs on public
roads that are fit with additional sensors that feed into a control room manned with safety operators.
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Capital is used to vertically-integrate and deepen technology moats, e.g. in-house LiDAR
stateof.ai 2020
Waymo, Aurora, and GM Cruise have acquired LiDAR companies or built sensors in-house to hit a 300m range
and to each own a key technology component of their value chain.
Waymo’s in-house Honeycomb released in 2019
Aurora acquired Blackmore in 2019
GM Cruise acquired Strobe
in 2017
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Meanwhile, LiDAR incumbent Velodyne and challenger Luminar both go public on the Nasdaq via reverse mergers (SPAC) to compete with hardware and ADAS software
stateof.ai 2020
Velodyne will list shy of $2B valuation on $106M of net revenues in 2019 and Luminar at $3.4B. Velodyne
guides to upcoming software for autopilot and collision avoidance that makes use of a future Vela LiDAR.
Luminar points to an agreement with Volvo that sees its integration into their vehicle platform in 2022.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Starsky Robotics, which was the first company to run an autonomous unmanned truck drive on a public highway,
closed its doors in Q1 2020. It openly cited the challenges of scaling supervised learning.
Supervised learning and the cost of edge cases: New technology approaches are needed
stateof.ai 2020
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Leading companies crowdsource ideas from open source using data they’ve generated
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
11 major datasets and 2 updates since 2019, many of which include cameras, video, LiDAR and motion traces.
Boxy dataset, April ‘19
1.99M vehicle bounding boxes
Unsupervised Lamas, May ‘19
>100k images for lane markings
Argoverse, June ‘19
3D vehicle tracking for 113 scenes
SemanticKITTI, July ‘19
LiDAR-based semantic segmentation
Lyft Perception, July ‘19
350+ 1 hr drives, LiDAR, camera
Waymo Open Dataset, August ‘19
1k 20 sec drives, LiDAR, camera
Street-Level Sequences, April ‘20
>1.6M images, 30 cities, 6 continents over 9 yrs
Adverse Driving Conditions, Feb ‘20
75 scenes in bad weather, LiDAR, camera
Autonomous Driving Dataset, April ‘20
40k frames w/semantic segm., LiDAR, camera
PandaSet, June ‘20
100+ 8 sec drives, LiDAR and camera
Lyft Prediction, July ‘20
1k+ hrs of traffic agents, LiDAR, radar, camera
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Use of ML in self-driving is still mostly limited to perception with large parts of stack hand-engineered.
The Next Step: New models and a shift in focus from perception to motion prediction
stateof.ai 2020
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The new frontier of self-driving development is machine learning for planning
stateof.ai 2020
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
New algorithms working akin AlphaGo and trained on large amount of human driving
demonstrations are being developed.
New datasets can change the power balance of existing leading players.
new kinds of systems. Companies that can leverage the
scale of their human driver fleets can build an advantage
in the data race that will power new model innovation.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The consumer-first approach to self-driving: Tesla has hundreds of thousands of Autopilot-enabled cars in the wild and consumers help inch it towards “Full Self-Driving”
stateof.ai 2020
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
Costing $8,000 today, Tesla is a rare breed in monetising their Autopilot system to the tune of $100M’s so far.
Given the importance of edge-case reliability, their “driver-in-the-loop” engineering approach could pay off.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Graphcore released their Mk2 IPU processor, which packs 59.4 billion transistors on a 823 sqmm die using a
7 nm process. This is the most complex processor ever made.
AI problems like self-driving thrive on compute: New providers of specialized AI compute platforms are already onto their generation 2+ products
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Figure note: Comparing 8x C2 PCIe IPU-Processors with IPU-LINK vs. 8x M2000 IPU-Machine with IPU-FABRIC
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The reported 16x faster training time for the image classification model EfficientNet-B4 on the M2000 vs.
NVIDIA DGX-A100 translates to an 12x cost advantage. This does not factor in the cost of migrating to a new
development platform and mastering its tooling.
Graphcore M2000 offers faster training time to drop the cost of state-of-the-art models
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
=
16x DGX-A100
8x IPU-M2000
$259,600
$3,000,000
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The TPU v4 packs 2x the matrix multiplication TFLOPs of the TPU v3, greater memory bandwidth and improved
interconnect technology. The supercomputer used for MLPerf v0.7 submissions is 4x the size of the TPU v3 Pod
used in MLPerf v0.6.
Google’s new TPU v4 delivers up to a 3.7x training speedup over their TPU v3
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Figure note: All comparisons at 64-chip scale. Gains are due to hardware innovations and software improvements.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The A100 GPU is NVIDIA’s the first processor based on their new Ampere architecture. The company produced
4x performance gains on MLPerf in 1.5 years.
NVIDIA will not rest either: Up to 2.5x training speedups with the new A100 GPU vs V100
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Figure note: Per chip performance based on comparing performance at same scale and normalizing it to a single chip.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The rise of MLOps (DevOps for ML) signals an industry shift from technology R&D
(how to build models) to operations (how to run models)
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
25% of the top-20 fastest growing GitHub projects in Q2 2020 concern ML infrastructure, tooling and operations.
Google Search traffic for “MLOps” is now on an uptick for the first time.
Figure note: Left graph reproduces analysis from Runa Capital and right graph reproduces data from Google Search Trends for “MLOps”
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Sample (Hong Kong Monetary Authority*)
As AI adoption grows, regulators give developers more to think about
External monitoring is transitioning from a focus on business metrics down to low-level model metrics.
This creates challenges for AI application vendors including slower deployments, IP sharing, and more:
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Enterprises report that AI drives revenue in sales and marketing while reducing costs in supply chain management and manufacturing functions
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Results from a poll of 1,872 enterprises worldwide: Cost decreases and revenue increases from AI by function.
Average cost decrease
Average revenue increase
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
RPA and computer vision are the most common deployed techniques in the enterprise. Speech, natural language generation and physical robots are the least common
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
3% of respondents, the “high performers”, report 11 live AI use cases vs. 3 for the average enterprise. Retail
businesses reported the largest YoY use case expansion. AI tends to be applied in areas of core competency:
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Robotic process automation continues to tear through the enterprise
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
With over 7,000 enterprise customers, UiPath’s annual revenue growth is emblematic of the demand for
operational automation. By mid-2020 the business passed $400M in annual recurring revenue.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
PolyAI has rolled out its voice assistant for hospitality in the UK. The system is actively answering reservation
calls and assisting diners with special dietary requirements and providing COVID-19 guidance.
AI dialogue assistants are live and handling calls from UK customers today
40%
0%
2 months
Abandon rate
AI success rate
90%
50%
2 months
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Computer vision unlocks faster accident and disaster recovery intervention
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Tractable’s AI captures and processes imagery of the damage to automatically predict its repair costs.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
The API powers Tinyclues marketing decisioning suite with capabilities such as targeting, campaign prioritization and the ability to predict efficient marketing topics (pictured above) It powers more than 100,000 marketing campaigns, delivering an average revenue uplift of 40% against legacy approaches.
No-code ML automation: A universal prediction API for 360 customer data
Despite their diversity and lack of normalization, first-party 360 customer datasets share structural commonality.
Tinyclues leverages this commonality to run a no-code prediction API.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
| Company W | Company X | Company Y | Company Z |
Sustainability | 91 | 84 | 93 | 56 |
Pollution | 76 | 28 | 87 | 54 |
Corporate Governance | 56 | 56 | 83 | 24 |
Community Impact | 88 | 69 | 74 | 60 |
Labour Practices | 72 | 71 | 69 | 68 |
NLP is used to automate quantification of a company’s Environmental, Social and Governance (ESG) perception using the world’s news
NLP can derive ESG perception scores by assessing the relationships and sentiments of products and
companies with respect to client-specific ESG reputation pillars (e.g., environment, diversity, and more).
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Machine learning protects humans from email spear phishing attacks
During COVID-19, Tessian observed a 30x increase in email phishing attacks that specifically exploited
uncertainty around the pandemic.
COVID-19 related phishing attacks are on the rise
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
With more identity documents digitally captured, Onfido’s AI system learns to detect fake documents that run
rampant online.
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Computer vision detects subtle evidence of tampered identity documents
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Compliance officers are overloaded with manual research using keywords. ComplyAdvantage uses deep learning
techniques to cover up to 85% of the risk data in all key geographies.
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
AI is the key to Web-scale content analysis for money laundering and terrorist financing
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Machine translation unlocks financial crime classification globally
Machine translation is used to generate multilingual training data for financial crime classification. This approach
significantly reduced lead time from 20 weeks for English to less than 2 weeks per European language while
maintaining more than 80% of the recall and precision.
stateof.ai 2020
1.0
0.8
0.6
0.4
0.2
0
0.4
Multilingual financial crime classification performance from English is maintained
0.2
0.6
0.8
1.0
Precision
French
German
Spanish
Dutch
English
Italian
English AUC
Machine translated multilingual data
English annotation
100
75
50
25
0
Days to collect annotated training data for adverse media classification
Recall
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
BERT language model goes mainstream: Upgrading Google and Microsoft’s Bing search query understanding
From open source publication to processing search queries in large-scale production within 12 months (assuming
the paper’s publication was not purposefully held back).
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions | Conclusion
#AIreport
Berkshire Grey robotic installations are achieving millions of robotic picks per month
Supply chain operators realise a 70% reduction in direct labour as a result.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
CNC Machines produce over >$168B worth of parts per year for manufacturing, carving blocks of metal into
useful shapes. CloudNC is automating the programming of these machines.
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Manufacturing: CNC Machine programming starts to be automated
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Open source model and dataset sharing is driving NLP’s cambrian explosion
1,000+ companies are using Hugging Face’s Transformers library in production: 5M pip installs, 2,500+
community transformer models trained in over 164 languages by 430 contributors.
Number of weekly unique user instantiations of the top 10 models from huggingface.co
45k
0
10k
20k
30k
25k
15k
5k
35k
40k
Sept-19
Nov-19
Jan-20
Mar-20
May-20
Jul-20
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Rasa’s libraries and tools have clocked >2 million downloads and have 400+ open source contributors.
Open source conversational AI expands its footprint across industry
Healthcare
Insurance
Banking
Telecommunications
Manufacturing
Technology
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
2020 is likely to hit $25B+ in total volume and 350+ deals. Rounds >$100M consistently account for ~10% of all
funding rounds since 2018 onwards. This signals the increasing maturity of the field.
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Private >$15M funding rounds for AI-first companies remain strong in spite of COVID-19
Figure note: Data retrieved from Pitchbook on 13 August 2020. Asterix indicates annualized figures for 2020 using light blue and orange.
*
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Section 4: Politics
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Ethical risks: A group of researchers have spent years helping to frame the ethical risks of deploying ML in certain sensitive contexts. This year those issues went mainstream.
Examples include policing, the judiciary and the military. A few trailblazing researchers include:
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Facial recognition is remarkably common around the world
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
50% of the world currently allows the use of facial recognition. Only 3 countries (Belgium, Luxembourg,
Morocco) have partial bans on the technology that only allow it in specific cases.
Actively in use
Partially-banned or no evidence of use
Considering
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: From potential risks to wrongful arrests
Two (known) examples of wrongful arrests due to erroneous use of facial recognition algorithms emerge.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: Facebook settles class action lawsuit for $650M
Facebook’s automatic photo-tagging was in violation of Illinois’ 2008 Biometric Privacy Act.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
A New York Times investigation revealed that Clearview scraped billions of images and then licensed their
“search engine for faces” to over 600 law enforcement agencies.
Clearview exposes what is now technically possible with facial recognition
Number of searchable
photos per database
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: More thoughtful approaches gather steam
Large technology companies are taking a more careful path.
recognition tool Rekognition to give “congress enough time to put in place
appropriate rules”.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: More thoughtful approaches gather steam
The creators of ImageNet produced an update that takes first steps towards reducing bias.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: A new legal precedent in the UK emphasizes that facial recognition tools cannot “move fast and break things”
Shifting legal framework for law enforcement.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: Washington State passes new law with active support from Microsoft
The new law requires government agencies to obtain a warrant to run facial recognition scans.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
of Education has pledged to “curb and regulate” the use of facial
recognition in schools.
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Facial recognition: The first legal challenge in China
Professor Guo Bing of Zhejiang Sci-Tech University sued a local safari park for "violating consumer protection law” after it made facial recognition registration a mandatory requirement for visitor entrance
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Lawmakers scramble to legislate against the use of deepfakes
Increased awareness of deepfakes causes a rush of activity led by China and California.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Algorithmic decision making: Regulatory pressure builds
Multiple countries and states start to wrestle with how to regulate the use of ML in decision making.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
GPT-3, like GPT-2, still outputs biased predictions when prompted with topics of religion
Example from the GPT-3 (left) and GPT-2 (right) with prompts and the model’s predictions, which contain
clear bias. Models trained on large volumes of language on the internet will reflect the bias in those datasets
unless their developers make efforts to fix this. See our coverage in State of AI Report 2019 of how Google
adapted their translation model to remove gender bias.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
From DeepMind to U.S. Army Research Lab, AI research agendas start to overlap
Three months after DeepMind’s StarCraft II breakthrough, the US Army publishes interesting StarCraft results.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The U.S. continues to make major investments to implement military AI systems
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
As machine learning techniques continue to industrialise they are increasingly explored by militaries.
However, the degree of real-world impact is not yet clear.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
As defense roadmaps include more ML-enabled components, startups are winning lucrative government contracts and raising large venture rounds.
Startups at intersection of AI and defense raise large financing rounds
$200M Series C
July 2019
$100M Series C
July 2020
$62M Series A
2019
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
US State department loosens restrictions on drone exports, shifting from a blanket denial to a more
discretionary basis. Uninhabited aircraft now don’t count as missiles.
Is US-China competition weakening the Missile Technology Control Regime?
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
After AlphaGo and AlphaStar...AlphaDogfight
DARPA organised a virtual dogfighting tournament where various AI systems would compete with each other and a human fighter pilot from the US military.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
The US Secretary of Defense targets 2024 for real-life AI vs human dogfight
The US continues to emphasize importance of AI leadership to its military
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Many actors attempt to define principles for responsible use of AI
The US Department of Defense, The US Intelligence Community, China, and the OECD all develop or adopt
their own AI Policy documents.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Two of the leading AI conferences adopt new ethics codes
NeurIPS and ICLR both propose new ethical principles and expectations of researchers, but no mandatory code
and data sharing. As the largest conference in the field the proposals from NeurIPS should be high impact:
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Proliferating educational content and tools through the TensorFlow community.
Google is leaning into fairness, interpretability, privacy and security of AI models
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Department of Commerce adds 24 Chinese companies and institutions to a sanction list for “supporting the
Procurement of items for military end-use in China”.
White House extends its ban on Chinese companies with ties to surveillance in Xinjiang
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
For the first time in 9 years a company other than Apple or Samsung led the market. However, Huawei’s supply of
chips is running out under US sanctions by mid September 2020.
Huawei is an increasingly dominant player in smartphones and is investing heavily in machine learning technology
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
TSMC grows in importance to the US semiconductor strategy.
Semiconductors amplify the geopolitical significance of Taiwan and particularly TSMC
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
TSMC’s R&D expenses match SMIC’s revenues. TSMC is the only fabricator with 5nm manufacturing process (N5)
and it is now working on 3nm (N3) for 2x more power efficiency and 33% more performance than N7. In response,
SMIC said it will increase capital expenditure to $6.7B in 2020 (up from its original target of $3.1B).
Taiwan’s TSMC remains dominant in R&D expenditure and semiconductor manufacturing
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
China-listed chipmakers see their public market valuations soar in 2020. Cambricon goes public raising $370M.
Chinese government sets up an additional $29B state-backed fund reduce its dependency on American semiconductor technology
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
TSMC employees are offered as much as 2.5x their annual salary and bonuses to leave. Overall, Taiwan has
lost 3,000 semiconductor engineers in recent times (circa 10% of their national supply).
China hires over 100 TSMC engineers in push to close gap in semiconductor capabilities
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Although over half the world’s advanced chips are designed in America, only 12% are manufactured there.
US Senate proposes the CHIPS for America Act
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The vast majority of acquisitions have been blocked.
Given mounting concerns over chips, cross border M&A remains highly politicised
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
December 2016: US and Germany block $723M bid by China’s Fujian Grand Chip Investment Fund (China) for Aixtron.
September 2017: US blocks $1.3B bid by Canyon Bridge Capital Partner (China) for Lattice.
March 2018: US blocks $117B bid by Broadcom (previously headquartered in Singapore) for Qualcomm (USA).
July 2018: China blocks Qualcomm’s $44B bid for NXP (Netherlands).
April 2020: UK and US effectively block a complete takeover of Imagination Technologies (UK) by Canyon Bridge (China).
April 2020: China allows Nvidia’s (USA) $6.9B acquisition of Mellanox (Israel).
July 2020: Siemen’s (Germany) makes bid for Avatar (USA).
The reported potential acquisition of Arm (UK) by Nvidia (USA) will be a major test of where things stand.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
AI Nationalism: Governments increasingly plan to scrutinise acquisitions of AI companies
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
The State of AI Report and AI Nationalism essay predicted that political leaders would start to question
Whether acquisitions of key AI startups should be blocked. New legislation suggests this is now happening.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Hermann Hauser, a leading founder and investor, argues it would be bad for the UK if Arm is acquired by NVIDIA.
The likely sale of Arm to NVIDIA is questioned by many, including its founder
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
AI Nationalism in the US: AI budgets continue to expand
AI continues to be emphasized as the most important investment area in science and technology.
Federal budget for non-defense
AI R&D
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
AI Nationalism in the US: A major new bi-partisan act is proposed
The proposed ‘Endless Frontier’ act explicitly frames AI as a race between superpowers.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
AI Nationalism in China: Decentralising policy experimentation to cities
China moves to create “national new generation AI innovation and development pilot zones”.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
AI Nationalism in the UK: China hawks in the UK become more active
Pressure on the UK to choose between the US and China.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Another wave of countries declare national AI strategies
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Evidence suggests that the US tax code incentivises replacing humans with robots
Acemoglu, Manera and Restrepo’s paper demonstrates that tax reforms from 2000 to 2017 have caused the gap
between effective tax rates on labour and robots to dramatically widen.
Effective tax rates for U.S. companies, by type of expenditure
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Jobs at risk of automation in the EU 19 countries
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Executives from 1,872 enterprises worldwide report the largest AI-induced workforce contraction in automotive
and assembly and telecoms in the last year. Looking forward, the CPG, transport, utilities, retail and financial
services are expected to follow.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
A position paper and workshop explored various high leverage problems where ML methods can be applied.
Bengio, Hassabis, Ng and other AI research leaders unite at NeurIPS 2019 in a call to action for climate change
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Section 5: Predictions
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
stateof.ai 2020
8 predictions for the next 12 months
2. Attention-based neural networks move from NLP to computer vision in achieving state of the art results.
1. The race to build larger language models continues and we see the first 10 trillion parameter model.
3. A major corporate AI lab shuts down as its parent company changes strategy.
4. In response to US DoD activity and investment in US based military AI startups, a wave of Chinese and
European defense-focused AI startups collectively raise over $100M in the next 12 months.
5. One of the leading AI-first drug discovery startups (e.g. Recursion, Exscientia) either IPOs or is acquired for
over $1B.
6. DeepMind makes a major breakthrough in structural biology and drug discovery beyond AlphaFold.
7. Facebook makes a major breakthrough in augmented and virtual reality with 3D computer vision.
8. NVIDIA does not end up completing its acquisition of Arm.
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Section 6: Conclusion
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
Thanks!
Congratulations on making it to the end of the State of AI Report 2020! Thanks for reading.
In this report, we set out to capture a snapshot of the exponential progress in the field of machine learning, with a focus on developments since last year’s issue that was published on 26th June 2019. We believe that AI will be a force multiplier on technological progress in our world, and that wider understanding of the field is critical if we are to navigate such a huge transition.
We set out to compile a snapshot of all the things that caught our attention in the last year across the range of AI research, talent, industry and the emerging politics of AI.
We would appreciate any and all feedback on how we could improve this Report further, as well as contribution suggestions for next year’s edition.
Thanks again for reading!
Nathan Benaich (@nathanbenaich) and Ian Hogarth (@soundboy)
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
The authors declare a number of conflicts of interest as a result of being investors and/or advisors, personally or via funds, in a number of private and public companies whose work is cited in this report.
Ian is an angel investor in: Chorus.ai, ComplyAdvantage, Disperse, Faculty, LabGenius, and PostEra.
Nathan and Air Street Capital are shareholders of: Graphcore, LabGenius, Niantic, ONI, PolyAI, Secondmind, Tractable, and ZOE.
Conflicts of interest
#AIreport
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
About the authors
Nathan is the general partner of Air Street Capital, a venture capital firm investing in AI-first technology and life science companies. He founded RAAIS and London.AI, which connect AI practitioners from large companies, startups and academia, and the RAAIS Foundation that funds open-source AI projects. He studied biology at Williams College and earned a PhD from Cambridge in cancer research.
Nathan Benaich
Ian Hogarth
Ian is an angel investor in 60+ startups. He is a Visiting Professor at UCL working with Professor Mariana Mazzucato. Ian was co-founder and CEO of Songkick, the concert service used by 17m music fans each month. He studied engineering at Cambridge where his Masters project was a computer vision system to classify breast cancer biopsy images. He is the Chair of Phasecraft, a quantum software company.
stateof.ai 2020
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#AIreport
Introduction | Research | Talent | Industry | Politics | Predictions
#stateofai
State of AI Report
October 1, 2020
stateof.ai
Ian Hogarth
Nathan Benaich
#stateofai