ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
TitleAuthor(s)urlBroader Impact Statement
2
Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision ProcessesYuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shihttps://papers.nips.cc/paper/2020/file/1f10c3650a3aa5912dccc5789fd515e8-Paper.pdfThe current paper presents theoretical work without any foreseeable societal consequence. Therefore,
the authors believe that the broader impact discussion is not applicable.
An analysis of this dataset is in our paper Unpacking the Expressed Consequences of AI Research in Broader Impact Statements at AIES 2021. These 300 statements are a random sample from NeurIPS 2020 broader impact statements.
3
Practical Low-Rank Communication Compression in Decentralized Deep LearningThijs Vogels, Sai Praneeth Karimireddy, Martin Jaggihttps://papers.nips.cc/paper/2020/file/a376802c0811f1b9088828288eb0d3f0-Paper.pdfWe believe that the field of decentralized learning plays a key role in translating the recent successes
in deep learning from large organizations with large centralized datasets to smaller industry players
and individuals. In particular, decentralized and therefore collaborative training on decentralized
data is an important building block towards helping to better align each individual’s data ownership
and privacy with the resulting utility from jointly trained machine learning models. The ability
to train collaboratively on decentralized data may lead to transformative insights in many fields,
especially in applications where data is user-provided and privacy sensitive (Nedic, 2020). In addition
to privacy, efficiency gains in distributed training reduce the environmental impact of training large
machine learning models. The introduction of a practical and reliable communication compression
technique is a small step towards achieving these goals on collaborative privacy-preserving and
efficient decentralized learning.
4
Prediction with Corrupted Expert AdviceIdan Amir, Idan Attias, Tomer Koren, Yishay Mansour, Roi Livnihttps://papers.nips.cc/paper/2020/file/a512294422de868f8474d22344636f16-Paper.pdfThere are no foreseen ethical or societal consequences for the research presented herein.
5
Organizing recurrent network dynamics by task-computation to enable continual learningLea Duncker, Laura Driscoll, Krishna V. Shenoy, Maneesh Sahani, David Sussillohttps://papers.nips.cc/paper/2020/file/a576eafbce762079f7d1f77fca1c5cc2-Paper.pdfThis work proposes a novel continual learning algorithm which will contribute to the advance of
related methods. Continual learning of dynamic tasks has not been well-explored in machine learning
so far, but will likely be important for fields such as robotics and developing artificial intelligent
agents more generally. Furthermore, we utilize the framework of recurrent networks to test and refine
hypotheses about computation in biological systems. Advances in this area will contribute to the
design of new experiments and aid the analyses of recorded data in the field of neuroscience.
6
A Catalyst Framework for Minimax OptimizationJunchi Yang, Siqi Zhang, Negar Kiyavash, Niao Hehttps://papers.nips.cc/paper/2020/file/3db54f5573cd617a0112d35dd1e6b1ef-Paper.pdfOur work provides a family of simple and efficient algorithms for some classes of minimax optimiza-
tion. We believe our theoretical results advance many applications in ML which requires minimax
optimization. Of particular interests are deep learning and fair machine learning.

Deep learning is used in many safety-critical environments, including self-driving car, biometric
authentication, and so on. There is growing evidence that shows deep neural networks are vulnerable
to adversarial attacks. Since adversarial attacks and defenses are often considered as two-player games,
progress in minimax optimization will definitely empower both. Furthermore, minimax optimization
problems provide insights and understanding into the balance and equilibrium between attacks and
defenses. As a consequence, making good use of those techniques will boost the robustness of deep
learning models and strengthen the security of its applications.

Fairness in machine learning has attracted much attention, because it is directly relevant to policy
design and social welfare. For example, courts use COMPAS for recidivism prediction. Researchers
have shown that bias is introduced into many machine learning systems through skewed data, limited
features, etc. One approach to mitigate this is adding constraints into the system, which naturally
gives rise to minimax problems.
7
Optimal Learning from Verified Training DataNicholas Bishop, Long Tran-Thanh, Enrico Gerdinghttps://papers.nips.cc/paper/2020/file/6c1e55ec7c43dc51a37472ddcbd756fb-Paper.pdf
The manipulation and fairness of algorithms form a significant barrier to practical application of
theoretically effective machine learning algorithms in many real world use cases. With this work,
we have attempted to address the important problem of data manipulation, which has many societal
consequences. Data manipulation is one of many ways in which an individual can “game the system"
in order to secure beneficial outcomes for themselves to the detriment of others. Thus, reducing the
potential benefits of data manipulation is of worthwhile consideration and focus. Whilst this paper is
primarily of theoretical focus, we hope that our work will form a contributing step towards safe, fair,
and effective application of machine learning algorithms in more practical settings.

8
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture SignalsJing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xiehttps://papers.nips.cc/paper/2020/file/27059a11c58ade9b03bde05c2ca7c285-Paper.pdf
Benefits

Our conditional chain model addresses the problems where one input sequence is mapped to multiple
sequences by taking advantage of the intrinsic interaction between the output sequences. There are a
variety of applications that can benefit from the use of the conditional information, such as the text
generation tasks. Another important application is the cocktail party problem in speech processing.
With the parallel mapping models, which are the dominant method at present, the model cannot handle
the variable number of speakers flexibly due to the limitation of the model structure. In such models,
the solution to label permutation problems is to exhaustively compute all the permutations with the
computation cost of N !, which cannot be neglected when the number of speakers are more than 3.
However, using the conditional model can avoid this problem. It also proves the effectiveness of
our model which achieves relatively good performance in both separation and recognition tasks. We
make a further step towards attacking cocktail party problem. This will improve the communication
quality of human-computer interaction. And our method can also be applied in meeting transcription
system to provide better performance. We would like to make our code available latter to facilitate
the study applied to other tasks.

Drawbacks

There is no doubt that the improvement of artificial intelligence can potentially revolutionise our
societies in many ways. However, it also bring some risks to human’s privacy. With the abusing use
of speech separation and recognition techniques, hackers can easily monitor people’s daily life, while
a strong NLP system can also be applied to Internet fraud. We think the community should not only
focus the development of techniques, but also concern the privacy issue. Besides, the widely use of
artificial intelligence techniques may also lead to mass-scale unemployment problems, such as call
center.
9
Learning Representations from Audio-Visual Spatial AlignmentPedro Morgado, Yi Li, Nuno Nvasconceloshttps://papers.nips.cc/paper/2020/file/328e5d4c166bb340b314d457a208dc83-Paper.pdf
Self-supervision reduces the need for human labeling, which is in some sense less affected by human
biases. However, deep learning systems are trained from data. Thus, even self-supervised models
reflect the biases in the collection process. To mitigate collection biases, we searched for 360◦videos
using queries translated into multiple languages. Despite these efforts, the adoption of 360◦video
cameras is likely not equal across different sectors of society, and thus learned representations may
still reflect such discrepancies.
10
Towards More Practical Adversarial Attacks on Graph Neural NetworksJiaqi Ma, Shuangrui Ding, Qiaozhu Meihttps://papers.nips.cc/paper/2020/file/32bb90e8976aab5298d5da10fe66f21d-Paper.pdf
For the potential positive impacts, we anticipate that the work may raise the public attention about
the security and accountability issues of graph-based machine learning techniques, especially when
they are applied to real-world social networks. Even without accessing any information about the
model training, the graph structure alone can be exploited to damage a deep learning framework with
a rather executable strategy.

On the potential negative side, as our work demonstrates that there is a chance to attack existing GNN
models effectively without any knowledge but a simple graph structure, this may expose a serious
alert to technology companies who maintain the platforms and operate various applications based
on the graphs. However, we believe making this security concern transparent can help practitioners
detect potential attack in this form and better defend the machine learning driven applications.

11
Tight First- and Second-Order Regret Bounds for Adversarial Linear BanditsShinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshidahttps://papers.nips.cc/paper/2020/file/15bb63b28926cd083b15e3b97567bbea-Paper.pdf
This is a theoretical work and does not present any foreseeable societal consequences.

12
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RLSaurabh Kumar, Aviral Kumar, Sergey Levine, Chelsea Finnhttps://papers.nips.cc/paper/2020/file/5d151d1059a6281335a10732fc49620e-Paper.pdf
Applications and Benefits

Our diversity-driven learning approach for improved robustness can be beneficial for bringing RL to
real-world applications, such as robotics. It is critical that various types of robots, including service
robotics, home robots, and robots used for disaster relief or search-and-rescue are able to handle
varying environment conditions. Otherwise, they may fail to complete the tasks they are supposed to
accomplish, which could have significant consequences in safety-critical situations.

It is conceivable that, during deployment of robotics systems, the system may encounter changes
in its environment that it has not previously dealt with. For example, a robot may be tasked with
picking up a set of objects. At test time, the environment may slightly differ from the training setting,
e.g. some objects may be missing or additional objects may be present. These previously unseen
configurations may confuse the agent’s policy and lead to unpredictable and sub-optimal behavior. If
RL algorithms are to be used to prescribe actions from input observations in a robotics application,
the algorithms must be robust to these perturbations. Our approach of learning multiple diverse
solutions to the task is a step towards achieving the desired robustness.

Risks and Ethical Issues

RL algorithms, in general, face a number of risks. First, they tend to suffer from reward specification
- in particular, the reward may not necessarily be completely aligned with the desired behavior.
Therefore, it can be difficult for a practitioner to predict the behavior of an algorithm when it
is deployed. Since our algorithm learns multiple ways to optimize a task reward, the robustness
and predictability of its behavior is also limited by the alignment of the reward function with the
qualitative task objective. Additionally, even if the reward is well-specified, RL algorithms face a
number of other risks, including (but not limited to) safety and stability. Our diversity-driven learning
paradigm suffers from the same issues, as different latent-conditioned policies may not produce
reliable behavior when executed in real world settings if the underlying RL algorithm is unstable.

13
Greedy inference with structure-exploiting lazy mapsMichael Brennan, Daniele Bigoni, Olivier Zahm, Alessio Spantini, Youssef Marzoukhttps://papers.nips.cc/paper/2020/file/5ef20b89bab8fed38253e98a12f26316-Paper.pdf
Who may benefit from this research? We believe users and developers of approximate inference
methods will benefit from our work. Our framework works as an “outer wrapper” that can improve
the effectiveness of any flow-based variational inference method by guiding its structure. We hope to
make expressive flow-based variational inference more tractable, efficient, and broadly applicable,
particularly in high dimensions, by developing automated tests for low-dimensional structure and
flexible ways to exploit it. The trace diagnostic developed in our work rigorously assesses the quality
of transport/flow-based inference, and may be of independent interest.

Who may be put at disadvantage from this research? We don’t believe anyone is put at disad-
vantage due to this research.

What are the consequences of failure of the system? We specifically point out that one contribu-
tion of this work is identifying when a poor posterior approximation has occurred. A potential failure
mode of our framework would be inaccurate estimation of the diagnostic matrix H or its spectrum,
suggesting that the approximate posterior is more accurate than it truly is. However, computing
the eigenvalues or trace of a symmetric matrix, even one estimated from samples, is a well studied
problem. And numerical software guards against poor eigenvalue estimation or at least warns if this
occurs. We believe the theoretical underpinnings of this work make it robust to undetected failure.

Does the task/method leverage biases in the data? We don’t believe our method leverages data
bias. As a method for variational inference, our goal is to accurately approximate a posterior
distribution. It is very possible to encode biases for/against a particular result in a Bayesian inference
problem, but that occurs at the level of modeling (choosing the prior, defining the likelihood) and
collecting data, not at the level of approximating the posterior.

14
Second Order PAC-Bayesian Bounds for the Weighted Majority VoteAndres Masegosa, Stephan Lorenzen, Christian Igel, Yevgeny Seldinhttps://papers.nips.cc/paper/2020/file/386854131f58a556343e056f03626e00-Paper.pdf
Ensemble classifiers, in particular random forests, are among the most important tools in machine
learning [Fernández-Delgado et al., 2014, Zhu, 2015], which are very frequently applied in practice
[e.g., Chen and Guestrin, 2016, Hoch, 2015, Puurula et al., 2014, Stallkamp et al., 2012]. Our
study provides generalization guarantees for random forests and a method for tuning the weights
of individual trees within a forest, which can lead to even higher accuracies. The result is of high
practical relevance.

Given that machine learning models are increasingly used to make decisions that have a strong impact
on society, industry, and individuals, it is important that we have a good theoretical understanding
of the employed methods and are able to provide rigorous guarantees for their performance. And
here lies the strongest contribution of the line of research followed in our study, in which we derive
rigorous bounds on the generalization error of random forests and other ensemble methods for
multiclass classification.

15
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image DataBenjamin Biggs, David Novotny, Sebastien Ehrhardt, Hanbyul Joo, Ben Graham, Andrea Vedaldihttps://papers.nips.cc/paper/2020/file/ebf99bb5df6533b6dd9180a59034698d-Paper.pdf
Our method improves the ability of machines to understand human body poses in images and videos.
Understanding people automatically may arguably be misused by bad actors. However, importantly,
our method is not a form of biometric as it does not allow the identification of people. Rather,
only their overall body shape and pose is reconstructed, but these details are insufficient for unique
identification. In particular, individual facial features are not reconstructed at all.

Furthermore, our method is an improvement of existing capabilities, but does not introduce a rad-
ical new capability in machine learning. Thus our contribution is unlikely to facilitate misuse of
technology which is already available to anyone.

Finally, any potential negative use of a technology should be balanced against positive uses. Un-
derstanding body poses has many legitimate applications in VR and AR, medical, assistance to the
elderly, assistance to the visual impaired, autonomous driving, human-machine interactions, image
and video categorization, platform integrity, etc.
16
Prophet Attention: Predicting Attention with Future AttentionFenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu Sunhttps://papers.nips.cc/paper/2020/file/13fe9d84310e77f13a6d184dbf1232f3-Paper.pdf
Our work aims to improve both the captioning and grounding performance of image captioning
systems, promoting the real-word application of image captioning, such as visual retrieval, human-
robot interaction and visually impaired people assistance. Furthermore, we can also improve the
model interpretability and transparency. However, the training of our framework relies on large
volume of image-caption pairs, which are not easily obtained in the real-world. Therefore, it requires
specific and appropriate treatment by experienced practitioners.

17
Differentiable Neural Architecture Search in Equivalent Space with Exploration EnhancementMiao Zhang, Huiqi Li, Shirui Pan, Xiaojun Chang, Zongyuan Ge, Steven Suhttps://papers.nips.cc/paper/2020/file/9a96a2c73c0d477ff2a6da3bf538f4f4-Paper.pdf
Automatic Machine Learning (AutoML) aims to build a better machine learning model in a data-
driven and automated manner, compensating for the lack of machine learning experts and lowering the
threshold of various areas of machine learning to help all the amateurs to use machine learning without
any hassle. These days, many companies, like Google and Facebook, are using AutoML to build
machine learning models for handling different businesses automatically. They especially leverage the
AutoML to automatically build Deep Neural Networks for solving various tasks, including computer
vision, natural language processing, autonomous driving, and so on. AutoML is an up-and-coming
tool to take advantage of the extracted data to find the solutions automatically.

This paper focuses on the Neural Architecture Search (NAS) of AutoML, and it is the first attempt
to enhance the intelligent exploration of differentiable One-Shot NAS in the latent space. The
experimental results demonstrate the importance of introducing uncertainty into neural architecture
search, and point out a promising research direction in the NAS community.

It is worth notice that NAS is in its infancy, and it is still very challenging to use it to complete
automation of a specific business function like marketing analytics, customer behavior, or other
customer analytics.

18
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of DimensionalityYi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arorahttps://papers.nips.cc/paper/2020/file/0740bb92e583cd2b88ec7c59f985cb41-Paper.pdf
This does not present any foreseeable societal consequence.
19
Learning Dynamic Belief Graphs to Generalize on Text-Based GamesAshutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, Will Hamiltonhttps://papers.nips.cc/paper/2020/file/1fc30b9d4319760b04fab735fbfed9a9-Paper.pdf
Our work’s immediate aim—improved performance on text-based games—might have limited
consequences for society; however, taking a broader view of our work and where we’d like to take
it forces us to consider several social and ethical concerns. We use text-based games as a proxy
to model and study the interaction of machines with the human world, through language. Any
system that interacts with the human world impacts it. As mentioned previously, an example of
language-mediated, human-machine interaction is online customer service systems.

• In these systems, especially in products related to critical needs like healthcare, providing
inaccurate information could result in serious harm to users. Likewise, failing to communi-
cate clearly, sensibly, or convincingly might also cause harm. It could waste users’ precious
time and diminish their trust.

• The responses generated by such systems must be inclusive and free of bias. They must not
cause harm by the act of communication itself, nor by making decisions that disenfranchise
certain user groups. Unfortunately, many data-driven, free-form language generation systems
currently exhibit bias and/or produce problematic outputs.

• Users’ privacy is also a concern in this setting. Mechanisms must be put in place to protect
it. Agents that interact with humans almost invariably train on human data; their function
requires that they solicit, store, and act upon sensitive user information (especially in the
healthcare scenario envisioned above). Therefore, privacy protections must be implemented
throughout the agent development cycle, including data collection, training, and deployment.
• Tasks that require human interaction through language are currently performed by people.
As a result, advances in language-based agents may eventually displace or disrupt human
jobs. This is a clear negative impact.

Even more broadly, any systems that generate convincing natural language could be used to spread
misinformation.

Our work is immediately aimed at improving the performance of RL agents in text-based games, in
which agents must understand and act in the world through language. Our hope is that this work, by
introducing graph-structured representations, endows language-based agents with greater accuracy
and clarity, and the ability to make better decisions. Similarly, we expect that graph-structured
representations could be used to constrain agent decisions and outputs, for improved safety. Finally,
we believe that structured representations can improve neural agents’ interpretability to researchers
and users. This is an important future direction that can contribute to accountability and transparency
in AI. As we have outlined, however, this and future work must be undertaken with awareness of its
hazards.
20
Demixed shared component analysis of neural population data from multiple brain areasYu Takagi, Steven Kennerley, Jun-ichiro Hirayama, Laurence Hunthttps://papers.nips.cc/paper/2020/file/44ece762ae7e41e3a0b1301488907eaa-Paper.pdf
Although several studies have investigated communication between populations of neurons, task-
related communication has been ignored. This is of fundamental importance in neuroscience, and we
show that it can be achieved simply by extending the previous method. We believe our methods will
be beneficial to the neuroscientists who will investigate interaction among multiple brain areas in
terms of specific task parameter of interest.

21
Neural FFTs for Universal Texture Image SynthesisMorteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, Bryan Catanzarohttps://papers.nips.cc/paper/2020/file/a23156abfd4a114c35b930b836064e8b-Paper.pdf
Our AI research offers a powerful tool to synthesize a diverse range of textures with high fidelity and
in a real-time manner. Our unique perspective of combining FFT, from signal processing tools, with
deep learning for hallucinating images can be a great asset for other generation and style transfer
tasks in graphics and vision. From the application standpoint, several applications in graphics and
vision directly benefit from our tools to replace their tedious and manual synthesis platforms.

In particular, it helps rapidly create natural scenes for computer game developers, interior designers,
and artists. In addition, our AI-based tool can discover the generation process behind the real-world
scenes, which can help the professionals to better prototype ideas and create new textures.

In order to increase the positive impacts and reduce the downsides, we encourage further work to
bring the users in the AI loop for additional guidance. This can allow artists to freely incorporate
their creativity into the synthesis pipeline.

We also recommend the researchers and industries to investigate methods for further squeezing the
CNN architecture, and efficiently implement them on the processing hardware. This would help not
only make our tools faster for edge computing applications, but also reduce the high computational
power consumed for training neural networks, that positively impacts the environment.
22
Improving robustness against common corruptions by covariate shift adaptationSteffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, Matthias Bethgehttps://papers.nips.cc/paper/2020/file/85690f81aadc1749175c187784afc9ee-Paper.pdf
The primary goal of this paper is to increase the robustness of machine vision models against common
corruptions and to spur further progress in this area. Increasing the robustness of machine vision systems can enhance their reliability and safety, which can potentially contribute to a large range of use cases including autonomous driving, manufacturing automation, surveillance systems, health care
and others. Each of these uses may have a broad range of societal implications: autonomous driving
can increase mobility of the elderly and enhance safety, but could also enable more autonomous
weapon systems. Manufacturing automation can increase resource efficiency and reduce costs for
goods, but may also increase societal tension through job losses or increase consumption and thus
waste. Of particular concern (besides surveillance) is the use of generative vision models for spreading
misinformation or for creating an information environment of uncertainty and mistrust.

We encourage further work to understand the limitations of machine vision models in out-of-
distribution generalization settings. More robust models carry the potential risk of automation
bias, i.e., an undue trust in vision models. However, even if models are robust to common corruptions,
they might still quickly fail on slightly different perturbations like surface reflections. Understanding
under what conditions model decisions can be deemed reliable or not is still an open research question
that deserves further attention.

23
The Smoothed Possibility of Social ChoiceLirong Xiahttps://papers.nips.cc/paper/2020/file/7e05d6f828574fbc975a896b25bb011e-Paper.pdfIn this paper we aim to provide smoothed possibilities of social choice, which is an important problem in the society. Therefore, success of the research will benefit general public beyond the CS
research community because better solutions are now available for a wide range of group decisionmaking scenarios.
24
End-to-End Learning and Intervention in GamesJiayang Li, Jing Yu, Yu Nie, Zhaoran Wanghttps://papers.nips.cc/paper/2020/file/c21f4ce780c5c9d774f79841b81fdc6d-Paper.pdfOur work helps understand and resolve social dilemmas resulting from pervasive conflict between
self- and collective interest in human societies. The potential applications of the proposed modeling
framework range from addressing externality in economic systems to guiding large-scale infrastructure
investment. Planners, regulators, policy makers of various human systems could benefit from the
decision making tools derived from this work.
25
Self-Supervised Graph Transformer on Large-Scale Molecular DataYu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying WEI, Wenbing Huang, Junzhou Huanghttps://papers.nips.cc/paper/2020/file/94aef38441efa3380a3bed3faf1f9d5d-Paper.pdfIn this paper, we have developed a self-supervised pre-trained GNN model—GROVER to extract the
useful implicit information from massive unlabelled molecules and the downstream tasks can largely
benefit from this pre-trained GNN models. Below is the broader impact of our research:

- For machine learning community: This work demonstrates the success of pre-training
approach on Graph Neural Networks. It is expected that our research will open up a new
venue on an in-depth exploration of pre-trained GNNs for broader potential applications,
such as social networks and knowledge graphs.

- For the drug discovery community: Researchers from drug discovery can benefit from
GROVER from two aspects. First, GROVER has encoded rich structural information of
molecules through the designing of self-supervision tasks. It can also produce feature vectors
of atoms and molecule fingerprints, which can directly serve as inputs of downstream tasks.
Second, GROVER is designed based on Graph Neural Networks and all the parameters are
fully differentiable. So it is easy to fine-tune GROVER in conjunction with specific drug
discovery tasks, in order to achieve better performance. We hope that GROVER can help
with boosting the performance of various drug discovery applications, such as molecular
property prediction and virtual screening.
26
Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved ConfoundingVictor Veitch, Anisha Zaverihttps://papers.nips.cc/paper/2020/file/7d265aa7147bd3913fb84c7963a209d1-Paper.pdfThis paper addressed sensitivity analysis for causal inference. We have extended Imbens’ approach
to allow the use of arbitrary machine-learning methods for the data modeling. Austen plots provide
an entirely post-hoc and blackbox manner of conducting sensitivity analysis. In particular, they make
it substantially simpler to perform sensitivity analysis. This is because the initial analysis can be
performed without have a sensitivty analysis already in mind, and because producing the sensitivity
plots only requires predictions from models that the practitioner has fit anyways.
The ideal positive consequence is that routine use of Austen plots will improve the credibility of
machine-learning based causal inferences from observational data. Austen plots allow us to both
use state-of-the-art models for the observed part of the data, and to reason coherently about the
causal effects of potential unobserved confounders. The availability of such a tool may speed the
adoption of machine-learning based causal inference for important real-world applications (where,
so far, adoption has been slow).
On the negative side, an accelerated adoption of machine-learning methods into causal practice may
be undesirable. This is simply because the standards of evidence and evaluation used in common
machine-learning practice do not fully reflect the needs of causal practice. Austen plots partially
bridge this gap, but they just one of the elements required to establish credibility
27
On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic SystemsKaiqing Zhang, Bin Hu, Tamer Basarhttps://papers.nips.cc/paper/2020/file/fb2e203234df6dee15934e448ee88971-Paper.pdf
We believe that researchers of reinforcement learning (RL), especially those who are interested in the
theoretical foundations of robust RL, would benefit from this work, through the new insights and
angles we have provided regarding robust adversarial RL (RARL) in linear quadratic (LQ) setups,
from a rigorous robust control perspective. In particular, considering the impact of RARL [2] in
RL with prominent empirical performance, and the ubiquity and fundamentality of LQ setups in
continuous control, our results help pave the way for applying the RARL idea in control tasks.

More importantly, building upon the concepts from robust control, we have laid emphasis on the
robust stability of RARL algorithms when applied to control systems, which has been overlooked
in the RL literature, and is significant in continuous control, as a destabilized system can lead to
catastrophic consequences. Such emphasis may encourage the development of more robust, and
more importantly, safe on-the-fly, RARL algorithms, and push forward the development of RL for
safety-critical systems as a whole. It also opens up the possibility to integrate more tools from the
classic (robust) control theory, to improve the stability and robustness of popular RL algorithms
practically used.

We do not believe that our research will cause any ethical issue, or put anyone at any disadvantage.

28
Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication TimeJerry Li, Guanghao Yehttps://papers.nips.cc/paper/2020/file/9529fbba677729d3206b3b9073d1e9ca-Paper.pdfMoving forward, it is imperative that machine learning systems cannot be gamed by malicious entities.
This work builds upon a growing literature of principled algorithms for robust statistics, which are
methods for defending against data poisoning attacks, where a training set may be tampered with by
an adversary who wishes to change the behavior of the algorithm. For instance, such defenses are
important in where the training data is crowdsourced, such as in federated learning, where we cannot
fully trust the training data. In such settings, if the defense fails, attackers can completely invalidate
the output of the model. That is why we believe it is critical to develop principled defenses, with
provable worst-case guarantees, as we do here. With such defenses, we know that this worst-case
behavior cannot happen.

The algorithms developed here are also useful for exploratory data analysis, as demonstrated
in [DKK+17]. Most real-world high-dimensional datasets are inherently very noisy, and this noise
can disguise interesting patterns from data analysts. These methods can be used in exploratory data
analysis to remove this noise, and to recover these phenomena.

We do not believe that this method leverages any biases in the data. Our generative model, as stated
in the introduction, is very simple, and does not introduce any biases in this problem.

29
PLLay: Efficient Topological Layer based on Persistent LandscapesKwangho Kim, Jisu Kim, Manzil Zaheer, Joon Kim, Frederic Chazal, Larry Wassermanhttps://papers.nips.cc/paper/2020/file/b803a9254688e259cde2ec0361c8abe4-Paper.pdfThis paper proposes a novel method of adapting tools in applied mathematics to enhance the learn-
ability of deep learning models. Even though our methodology is generally applicable to any complex
modern data, it is not tuned to a specific application that might improperly incur direct societal/ethical
consequences. So the broader impact discussion is not needed for our work.
30
Robust Meta-learning for Mixed Linear Regression with Small BatchesWeihao Kong, Raghav Somani, Sham Kakade, Sewoong Ohhttps://papers.nips.cc/paper/2020/file/3214a6d842cc69597f9edf26df552e43-Paper.pdf
One of the main contribution of this paper is to protect meta-learning approaches against data
poisoning attacks. Such robustness encourages participation from data contributors, as they can
collaborate without necessarily trusting the other data contributors. This facilitates participation of
minor contributors who suffer from data scarcity. This fosters democratization of machine learning
by allowing minor contributors to enjoy the benefit of big data through collaboration. Such ecosystem
will also encourage data sharing, thus improving transparency.

The adaptive guarantee we provide in Theorem 1 is fair, in the sense that a group that provides low
noise data will receive a model with better accuracy. However, one potential risk in fairness is that
meta-learning might result in varying accuracy across the groups. This can be problematic as an
under-represented group in training data could suffer from inaccurate prediction for that population.
This is an active area of research in the fairness community, but there is no strong experimental
evidence that this can be mitigated with algorithmic innovations that do not involve collecting more
data from the under-represented population.

Another concern in meta-learning with data sharing is privacy. Without proper system to regulate the
usage of shared data, sensitive information could be leaked or protected features could be inferred.
One silver lining is that robust methods are naturally private, as the trained model is by definition not
sensitive to any one particular data point. On the other hand, if the system relies on the participation of
various individuals, then either a technological solution needs to be implemented with cryptographic
or privacy preserving primitives, or a proper regulation must be enforced.
31
Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment EffectsZijun Gao, Yanjun Hanhttps://papers.nips.cc/paper/2020/file/f75b757d3459c3e93e98ddab7b903938-Paper.pdfThis work mainly provides theoretical tools and bounds for the HTE estimation in causal inference, as
well as potentially useful practical insights such as the two-stage nearest-neighbors and throwing away
observations with poor covariate matching quality. This special form of nonparametric estimation
problems could be a useful addition to the literature in nonparametric statistics, and theorists and
practitioners working on causal inference may potentially benefit from this work.
32
Provably Robust Metric LearningLu Wang, Xuanqing Liu, Jinfeng Yi, Yuan Jiang, Cho-Jui Hsiehhttps://papers.nips.cc/paper/2020/file/e038453073d221a4f32d0bab94ca7cee-Paper.pdfIn this work, we study the problem of adversarial robustness of metric learning. Adversarial robustness, especially robustness verification, is very important when deploying machine learning models
into real-world systems. A potential risk is the research on adversarial attack, while understanding
adversarial attack is a necessary step towards developing provably robust models. In general, this
work does not involve specific applications and ethical issues.
33
Preference learning along multiple criteria: A game-theoretic perspectiveKush Bhatia, Ashwin Pananjady, Peter Bartlett, Anca Dragan, Martin J. Wainwrighthttps://papers.nips.cc/paper/2020/file/52f4691a4de70b3c441bca6c546979d9-Paper.pdf
An important step towards deploying AI systems in the real world involves aligning their objectives
with human values. Examples of such objectives include safety for autonomous vehicles, fairness
for recommender systems, and effectiveness of assistive medical devices. Our paper takes a step
towards accomplishing this goal by providing a framework to aggregate human preferences along
such subjective criteria, which are often hard to encode mathematically. While our framework is
quite expressive and allows for non-linear aggregation across criteria, it leaves the choice of the target
set in the hands of the designer. As a possible negative consequence, getting this choice wrong could
lead to incorrect inferences and unexpected behavior in the real world.

34
Minibatch Stochastic Approximate Proximal Point MethodsHilal Asi, Karan Chadha, Gary Cheng, John C. Duchihttps://papers.nips.cc/paper/2020/file/fa2246fa0fdf0d3e270c86767b77ba1b-Paper.pdfData centers draw increasing amounts of the total energy we consume, and increasing applications
of machine learning mean that model-fitting and parameter exploration require a larger and larger
proportion of their energy expenditures [1, 14, 30]. Indeed, as Asi and Duchi [1] note, the energy to
train and tune some models is roughly on the scale of driving thousands of cars from San Francisco
to Los Angeles, while training a modern transformer network (with architecture search) generates
roughly six times the total CO2 of an average car’s lifetime [30]. It is thus centrally important to
build more efficient and robust methods, which allow us to avoid wasteful hyperparameter search but
simply work.

A major challenge in building better algorithms is that fundamental physical limits have forced CPU
speed and energy to essentially plateau; only by parallelization can we harness both increasing speed
and reduce the energy to fit models [14]. In this context, our methods take a step toward reducing the
energy and overhead to perform machine learning.

Taking a step farther back, we believe optimization and model-fitting research in machine learning
should refocus its attention: rather than developing algorithms that, with appropriate hyperparameter
tuning, achieve state-of-the-art accuracy for a given dataset, we should evaluate algorithms by
whether they robustly work. This would allow a more careful consideration of an algorithms’ costs
and benefits: is it worth 2× faster training, for appropriate hyperparameters, if one has to spend
25× as much time to find the appropriate algorithmic hyperparameters? Even more, as Strubell et al.
[30] point out, the extraordinary costs of hyperparameter tuning for fitting large-scale models price
many researchers out of making progress on certain frontiers; to the extent that we can mitigate these
challenges, we will allow more equity in who can help machine learning progress.
35
Fine-Grained Dynamic Head for Object DetectionLin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zhenghttps://papers.nips.cc/paper/2020/file/7f6caf1f0ba788cd7953d817724c2b6e-Paper.pdf
Object detection is a fundamental task in the computer vision domain, which has already been applied
to a wide range of practical applications. For instance, face recognition, robotics and autonomous
driving heavily rely on object detection. Our method provides a new dimension for object detection
by utilizing the fine-grained dynamic routing mechanism to improve performance and maintain low
computational cost. Compared with hand-crafted or searched methods, ours does not need much time
for manual design or machine search. Besides, the design philosophy of our fine-grained dynamic
head could be further extended to many other computer vision tasks, e.g., segmentation and video
analysis.

36
A Decentralized Parallel Algorithm for Training Generative Adversarial NetsMingrui Liu, Wei Zhang, Youssef Mroueh, Xiaodong Cui, Jarret Ross, Tianbao Yang, Payel Dashttps://papers.nips.cc/paper/2020/file/7e0a0209b929d097bd3e8ef30567a5c1-Paper.pdfIn this paper, researchers introduce a decentralized parallel algorithm for training Generative Adver-
sarial Nets (GANs). The proposed scheme can be proved to have a non-asymptotic convergence to
first-order stationary points in theory, and outperforms centralized counterpart in practice.

Our proposed decentralized algorithm is a class of foundational research, since the algorithm design
and analysis are proposed for a general class of nonconvex-nonconcave min-max problems and not
necessarily restricted for training GANs. Both the algorithm design and the proof techniques are
novel, and it may inspire future research along this direction.

Our decentralized algorithm has broader impacts in a variety of machine learning tasks beyond GAN
training. For example, our algorithm is promising in other machine learning problems whose objective
function has a min-max structure, such as adversarial training [92], robust machine learning [93], etc.

Our decentralized algorithm can be applied in several real-world applications such as image-to-image
generation [94], text-to-image generation [95], face aging [96], photo inpainting [97], dialogue
systems [98], etc. In all these applications, GAN training is an indispensable backbone. Training
GANs in these applications usually requires to leverage centralized large batch distributed training
which could suffer from inefficiency in terms of run-time, and our algorithm is able to address this
issue by drastically reducing the running time in the whole training process.

These real-world applications have a broad societal implications. First, it can greatly help people’s
daily life. For example, many companies provide online service, where an AI chatbot is usually
utilized to answer customer’s questions. However, the existing chatbot may not be able to fully
understand customer’s question and its response is usually not good enough. One can adopt our
decentralized algorithms to efficiently train a generative adversarial network based on the human-to-
human chatting history, and the learned model is expected to answer customer’s questions in a better
manner. This system can help customers and significantly enhance users’ satisfaction. Second, it can
help protect users’ privacy. One benefit of decentralized algorithms is that it does not need the central
node to collect all users’ information and every node only communicates with its trusted neighbors.
In this case, our proposed decentralized algorithms naturally preserve users’ privacy.

We encourage researchers to further investigate the merits and shortcomings of our proposed approach.
In particular, we recommend researchers to design new algorithms for training GANs with faster
convergence guarantees.
37
Synthesize, Execute and Debug: Learning to Repair for Neural Program SynthesisKavi Gupta, Peter Ebert Christensen, Xinyun Chen, Dawn Songhttps://papers.nips.cc/paper/2020/file/cd0f74b5955dc87fd0605745c4b49ee8-Paper.pdfProgram synthesis has many potential real-world applications. One significant challenge of program
synthesis is that the generated program needs to be precisely correct. SED mitigates this challenge
by not requiring the solution to be generated in one shot, and instead allowing partial solutions to be
corrected via an iterative improvement process, achieving an overall improvement in performance
as a result. We thus believe SED-like frameworks could be applicable for a broad range of program
synthesis tasks.
38
Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterizationChong You, Zhihui Zhu, Qing Qu, Yi Mahttps://papers.nips.cc/paper/2020/file/cd42c963390a9cd025d007dacfa99351-Paper.pdf
Robust learning of structured signals from high-dimensional data has a wide range of applications,
including imaging processing, computer vision, recommender systems, generative models and many
more. In this work, we presented a new type of practical methods and provided improved under-
standings of solving these problems via over-parameterized models. In particular, our method ex-
ploits the implicit bias introduced by the learning algorithm, with the underlying driving force being
the intrinsic structure of the data itself rather than human handcrafting. Such a design methodology
helps to eliminate human bias in the design process, hence provides the basis for developing truly
fair machine learning systems.

39
Structured Prediction for Conditional Meta-LearningRuohan Wang, Yiannis Demiris, Carlo Cilibertohttps://papers.nips.cc/paper/2020/file/1b69ebedb522700034547abc5652ffac-Paper.pdf
Meta-learning aims to construct learning models capable of learning from experiences, Its intended
users are thus primarily non-experts who require automated machine learning services, which may
occur in a wide range of potential applications such as recommender systems and autoML. The
authors do not expect the work to address or introduce any societal or ethical issues.

40
Training Stronger Baselines for Learning to OptimizeTianlong Chen, Weiyi Zhang, Zhou Jingyang, Shiyu Chang, Sijia Liu, Lisa Amini, Zhangyang Wanghttps://papers.nips.cc/paper/2020/file/51f4efbfb3e18f4ea053c4d3d282c4e2-Paper.pdfThis work mainly contributes to AutoML in the aspect of discovering better learning rules or
optimization algorithms from data. As a fundamental technique, it seems to pose no substantial
societal risk. This paper proposes several improved training techniques to tackle the dilemma of
training instability and poor generalization in learned optimizers. In general, learning to optimize
(L2O) prevents laborious problem-specific optimizer design, and potentially can largely reduce the
cost (including time, energy and expense) of model training or tuning hyperparameters.
41
MeshSDF: Differentiable Iso-Surface ExtractionEdoardo Remelli, Artem Lukoyanov, Stephan Richter, Benoit Guillard, Timur Bagautdinov, Pierre Baque, Pascal Fuahttps://papers.nips.cc/paper/2020/file/fe40fb944ee700392ed51bfe84dd4e3d-Paper.pdfComputational Fluid Dynamics is key to addressing the critical engineering problem of designing
shapes that maximize aerodynamic, hydrodynamic, and heat transfer performance, and much else
beside. The techniques we propose therefore have the potential to have a major impact in the field of
Computer Assisted Design by unleashing the full power of deep learning in an area where it is not
yet fully established.
42
The Cone of Silence: Speech Separation by LocalizationTeerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizermanhttps://papers.nips.cc/paper/2020/file/f056bfa71038e04a2400266027c169f9-Paper.pdfWe believe that our method has the potential to help people hear better in a variety of everyday
scenarios. This work could be integrated with headphones, hearing aids, smart home devices, or
laptops, to facilitate source separation and localization. Our localization output also provides a more
privacy-friendly alternative to camera based detection for applications like robotics or optical tracking.
We note that improved ability to separate speakers in noisy environments comes with potential privacy
concerns. For example, this method could be used to better hear a conversation at a nearby table
in a restaurant. Tracking speakers with microphone input also presents a similar range of privacy
concerns as camera based tracking and recognition in everyday environments.
43
Debiased Contrastive LearningChing-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba, Stefanie Jegelkahttps://papers.nips.cc/paper/2020/file/63c3ddcc7b23daa1e42dc41f9a44a873-Paper.pdfUnsupervised representation learning can improve learning when only small amounts of labeled data
are available. This is the case in many applications of societal interest, such as medical data analysis
[5, 31], the sciences [22], or drug discovery and repurposing [38]. Improving representation learning,
as we do here, can potentially benefit all these applications.

However, biases in the data can naturally lead to biases in the learned representation [29]. These
biases can, for example, lead to worse performance for smaller classes or groups. For instance, the
majority groups are sampled more frequently than the minority ones [16]. In this respect, our method
may suffer from similar biases as standard contrastive learning, and it is an interesting avenue of
future research to thoroughly test and evaluate this.
44
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical PredictorsKarl Pertsch, Oleh Rybkin, Frederik Ebert, Shenghao Zhou, Dinesh Jayaraman, Chelsea Finn, Sergey Levinehttps://papers.nips.cc/paper/2020/file/c8d3a760ebab631565f8509d84b3b3f1-Paper.pdfWe proposed a method for visual prediction and planning that is able to solve long-horizon tasks
autonomously. This method may have a broader impact on capabilities of robots performing tasks
such as autonomous navigation or object manipulation, and may be applicable in settings such
as navigation of zones dangerous for humans, search and rescue, as well as warehouse robotics
applications. While the method, and in general all planning and reinforcement learning methods,
may be applied to a variety of settings, including those with questionable ethical motivation, we are
optimistic of the general positive impact of future autonomous robotic systems, especially in the areas
described above.

Another ethical consideration is that, since the model is able to produce long videos targeted to a
particular goal, it might be used to produce fake videos of people performing a certain action, and
provides a degree of control about that action through the specification of the goal image. This might
enable forging fake videos targeted at specific persons. However, recent research has shown that
most current methods for generating fake videos are easily detectable, both by people and automatic
detection methods [18, 1, 38].

45
Expert-Supervised Reinforcement Learning for Offline Policy Learning and EvaluationAaron Sonabend, Junwei Lu, Leo Anthony Celi, Tianxi Cai, Peter Szolovitshttps://papers.nips.cc/paper/2020/file/daf642455364613e2120c636b5a1f9c7-Paper.pdfWe believe ESRL is a tool that can help bring RL closer to real-world applications. In particular
this will be useful in the clinical setting to find optimal dynamic treatment regimes for complex
diseases, or at least assist in treatment decision making. This is because ESRL’s framework lends
itself to be questioned by users (physicians) and sheds light into potential biases introduced by the
data sampling mechanism used to generate the observed data set. Additionally, using hypothesis
testing and accommodating different levels of risk aversion makes the method sensible to offline
settings and different real-world applications. It is important when using ESRL and any RL method,
to question the validity of the policy’s decisions, the quality of the data, and the method that was used
to derive these.

46
A polynomial-time algorithm for learning nonparametric causal graphsMing Gao, Yi Ding, Bryon Aragamhttps://papers.nips.cc/paper/2020/file/85c9f9efab89cee90a95cb98f15feacd-Paper.pdfCausality and interpretability are crucial aspects of modern machine learning systems. Graphical
models in particular are a promising tool at the intersection of causality and interpretability, and
our work provides an intuitive approach to balance these issues against modeling flexibility with
nonparametric models. That being said, as this work is primarily theoretical, the broader impacts and
ethical implications of our work are most likely to be felt downstream in applications. For example,
while DAGs can provide causal insights under certain assumptions, these models can potentially be
used to provide a false sense of security when they are not applied and deployed carefully. Along
these lines, our work attempts to provide a rigourous sense of when flexible nonparametric causal
models can be learned from data, by developing both theory and algorithms to justify these models
from both mathematical and empirical perspectives.

47
A Unifying View of Optimism in Episodic Reinforcement LearningGergely Neu, Ciara Pike-Burkehttps://papers.nips.cc/paper/2020/file/0f0e13216262f4a201bec128044dd30f-Paper.pdfThe results presented in this paper are largely theoretical. We define a class of algorithms which
are theoretically well understood, but also benefit from a computationally efficient implementation.
The framework provided in this paper is very general so, in principle, any algorithm which fits
into the framework could be applied to any reinforcement learning problem in a tabular or factored
linear MDP. Consequently, as for any reinforcement learning algorithm, there is the potential for
algorithms developed using the ideas presented in this paper to be applied in settings which have
negative societal impacts, or in settings where the reward function is not well specified leading to
undesirable behaviors.

48
Sample Efficient Reinforcement Learning via Low-Rank Matrix EstimationDevavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yanghttps://papers.nips.cc/paper/2020/file/8d2355364e9a2ba1f82f975414937b43-Paper.pdfAs reinforcement learning becomes increasingly popular in practice and the problem dimension
grows, there is a soaring demand for data-efficient learning algorithms. Through the lens of low-rank
representation of so-called Q-function, this work proposes a theoretical framework to devise efficient
RL algorithms. The resulting “low-rank” algorithm, which utilizes a novel matrix estimation method,
offers both strong theoretical guarantees and appealing empirical performance.

In particular, the novel “low-rank” perspective about RL provides an effective tool to tackle RL
problems with both state and action spaces continuous, which have received much less attention
despite their practical significance. We believe that this work serves as an important step towards
provable efficient RL for continuous problems. The theoretical insights in this work can motivate
further research in both efficient RL and ME, while the empirical results should be beneficial more
broadly for practitioners working in continuous controls.

49
FleXOR: Trainable Fractional QuantizationDongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yunhttps://papers.nips.cc/paper/2020/file/0e230b1a582d76526b7ad7fc62ae937d-Paper.pdfDue to rapid advances in developing neural networks of higher model accuracy and increasingly
complicated tasks to be supported, the size of DNNs is becoming exponentially larger. Our work
facilitates the deployment of large DNN applications in various forms including mobile devices
because of the powerful model compression ratio. As for positive perspectives, hence, a huge amount
of energy consumption to run model inferences can be saved by our proposed quantization and
encryption techniques. Also, a lot of computing systems that are based on binary neural network
forms can improve model accuracy. We expect that lots of useful DNN models would be available for
devices of low cost. On the other hand, some common concerns on DNNs such as privacy breaching
and heavy surveillance can be worsened by DNN devices that are more available economically by
using our proposed techniques.

50
Efficient active learning of sparse halfspaces with arbitrary bounded noiseChicheng Zhang, Jie Shen, Pranjal Awasthihttps://papers.nips.cc/paper/2020/file/5034a5d62f91942d2a7aeaf527dfe111-Paper.pdfThis paper investigates a fundamental problem in machine learning and statistics. The theory and
algorithms presented in this paper are expected to benefit many broad fields in science and engineering,
such as learning theory, robust statistics, optimization, and applications in biology, climatology, and
seismology, to name a few. Our research belongs to the general paradigm of interactive learning, in
which the learning agent need to design adaptive sampling schemes to maximize data efficiency. We
are well aware that one needs to be careful in designing such sampling schemes, to avoid unintended
harms such as discrimination.

51
Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset BiasesSenthil Purushwalkam Shiva Prakash, Abhinav Guptahttps://papers.nips.cc/paper/2020/file/22f791da07b0d8a2504c2537c560001c-Paper.pdfThe goal of this work is to analyze existing self-supervised learning methods through diagnostic
experiments. Analysis and understanding of existing approaches help develop better interpretation of
ML algorithms and can be crucial in removing biases. Upon identifying the shortcomings of existing
approaches, we propose a modification to improve the representations learned by these approaches.
Self-supervised learning involves learning representations from a large collection of unlabeled data.
Since there is no human involvement in the data collection pipeline, we anticipate reduction in biases
that can come via human labeling. Furthermore, self-supervised learning is a relatively nascent
research topic with minimal deployability in the real-world. Therefore, while in the long run visual
self-supervised learning would be impactful, at this moment there is no immediate impact.

52
Ratio Trace Formulation of Wasserstein Discriminant AnalysisHexuan Liu, Yunfeng Cai, You-Lin Chen, Ping Lihttps://papers.nips.cc/paper/2020/file/c37f9e1283cbd4a6edfd778fc8b1c652-Paper.pdfIn the era of big data, business providers, data scientists, and governments try to explore opportu-
nities in the large scale and high-dimensional datasets. Nevertheless, several major computational
challenges arise and prevent practitioners from constructing effective algorithms or tools to analyze
their datasets. Dimensionality reduction (DR) plays an essential role in supervised and unsupervised
learning tasks when the datasets are high dimensional. One benefit of reducing the data dimension
before classification or clustering is to save storage and reduce computational cost for the later steps,
however, the DR technique itself can be costly. We study a recently proposed and promising DR
technique, the Wasserstein discriminant analysis, and propose a different formulation that could
achieve comparable or better results with less computational cost. We also analyze the problem
from a different perspective that was originated from electronic structure calculations, which could
be of interest to a broader audience in the machine learning community.
53
Winning the Lottery with Continuous SparsificationPedro Savarese, Hugo Silva, Michael Mairehttps://papers.nips.cc/paper/2020/file/83004190b1793d7aa15f8d0d49a13eba-Paper.pdfTraining deep neural networks usually requires significant computational resources. Additional
efforts are often needed to prune trained networks to enable efficient inference – for example, in
mobile applications which may be both power and compute constrained. Our work presents a new
technique via which to sparsify networks, and contributes to the analysis of the recently discovered
scientific phenomenon of re-trainable subnetworks (tickets). These contributions might open new
pathways towards reducing the computational resources required for deep learning, thereby having a
potentially wide-ranging practical impact across the field.

54
Training Normalizing Flows with the Information Bottleneck for Competitive Generative ClassificationLynton Ardizzone, Radek Mackowiak, Carsten Rother, Ullrich Köthehttps://papers.nips.cc/paper/2020/file/593906af0d138e69f49d251d3e7cbed0-Paper.pdfAs our IB-INN is not bound to any particular application, and applies to settings that can in principle
already be solved with existing methods, we foresee no societal advantages or dangers in terms of
direct application. More generally, we think accurate uncertainty quantification plays an important
role in a safe and productive use of AI.
55
Provable Online CP/PARAFAC Decomposition of a Structured Tensor via Dictionary LearningSirisha Rambhatla, Xingguo Li, Jarvis Haupthttps://papers.nips.cc/paper/2020/file/85b42dd8aae56e01379be5736db5b496-Paper.pdfThis work explores the theoretical foundations behind the success of popular alternating
minimization-based techniques for tensor factorization. Specifically, we propose an algo-
rithm for accurate model recovery for a tensor factorization task which has applications in
clustering and pattern recovery. Since clustering-based algorithms are used for identification
of users for targeted advertising campaigns on social network platforms, potential use cases
may target users based on their activity patterns. Nevertheless, understanding the theoreti-
cal aspects of machine learning algorithms is crucial for ensuring safety and trustworthiness
in critical applications, and can in fact be used to mitigate effects of the very biases that
these algorithms are prone to exacerbate.

56
Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step TreesShali Jiang, Daniel Jiang, Maximilian Balandat, Brian Karrer, Jacob Gardner, Roman Garnetthttps://papers.nips.cc/paper/2020/file/d1d5923fc822531bbfd9d87d4760914b-Paper.pdfThe central concern of this investigation is Bayesian optimization of an expensive-to-evaluate ob-
jective function. As is standard in this body of literature, our proposed algorithms make minimal
assumptions about the objective, effectively treating it as a “black box.” This abstraction is mathemat-
ically convenient but ignores ethical issues related to the chosen objective. Traditionally, Bayesian
optimization has been used for a variety of applications, including materials design and drug discovery
[7], and could have future applications to algorithmic fairness. We anticipate that our methods will
be utilized in these reasonable applications, but there is nothing inherent to this work, and Bayesian
optimization as a field more broadly, that preclude the possibility of optimizing a nefarious or at least
ethically complicated objective.
57
GANSpace: Discovering Interpretable GAN ControlsErik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Parishttps://papers.nips.cc/paper/2020/file/6fe43269967adbb64ec6149852b5cc3e-Paper.pdfAs our method is an image synthesis tool, it shares with other image synthesis tools the same potential
benefits (e.g., [2]) and dangers that have been discussed extensively elsewhere, e.g., see [18] for one
such discussion.

Our method does not perform any training on images; it takes an existing GAN as input. As discussed
in Section 3.2, our method inherits the biases of the input GAN, e.g., limited ability to place makeup
on male-presenting faces. Conversely, this method provides a tool for discovering biases that would
otherwise be hard to identify.

58
Certifying Strategyproof Auction NetworksMichael Curry, Ping-yeh Chiang, Tom Goldstein, John Dickersonhttps://papers.nips.cc/paper/2020/file/3465ab6e0c21086020e382f09a482ced-Paper.pdfThe immediate social impact of this work will likely be limited. Learned auction mechanisms are
of interest to people who care about auction theory, and may eventually be used as part of the
design of auctions that will be deployed in practice, but this has not yet happened to our knowledge.
We note, however, that the design of strategyproof mechanisms is often desirable from a social
good standpoint. Making the right move under a non-strategyproof mechanism may be difficult for
real-world participants who are not theoretical agents with unbounded computational resources. The mechanism may impose a real burden on them: the cost of figuring out the correct move. By contrast,
a strategyproof mechanism simply requires truthful reports—no burden at all.

Moreover, the knowledge and ability to behave strategically may not be evenly distributed, with the
result that under non-strategyproof mechanisms, the most sophisticated participants may game the
system to their own benefit. This has happened in practice: in Boston, some parents were able to game
the school choice assignment system by misreporting their preferences, while others were observed
not to do this; on grounds of fairness, the system was replaced with a redesigned strategyproof
mechanism Abdulkadiroglu et al. [2006].

Thus, we believe that in general, the overall project of strategyproof mechanism design is likely to
have a positive social impact, both in terms of making economic mechanisms easier to participate in
and ensuring fair treatment of participants with different resources, and we hope we can make a small
contribution to it.
59
Automatic Perturbation Analysis for Scalable Certified Robustness and BeyondKaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, Cho-Jui Hsiehhttps://papers.nips.cc/paper/2020/file/0cbc5671ae26f67871cb914d81ef8fc1-Paper.pdfIn this paper, we develop an automatic framework to enable perturbation analysis on any neural
network structures. Our framework can be used in a wide variety of tasks ranging from robust-
ness verification to certified defense, and potentially many more applications requiring a provable
perturbation analysis. It can also play an important building block for several safety-critical ML
applications, such as transportation, engineering, and healthcare, etc. We expect that our framework
will significantly improve the robustness and reliability of real-world ML systems with theoretical
guarantees.

An important product of this paper is an open-source LiRPA library with over 10,000 lines of code,
which provides automatic and differentiable perturbation analysis. This library can tremendously
facilitate the use of LiRPA for the research community as well as industrial applications, such as
verifiable plant control [50]. Our library of LiRPA on general computational graphs can also inspire
further improved implementations on automatic outer bounds calculations with provable guarantees.

Although our focus on this paper has been on exploring known perturbations and providing guarantees
in such clairvoyant scenarios, in real-world an adversary (or nature) may not adhere to our assumptions.
Thus, we may additionally want to understand implication of these unknown scenarios on the system
performance. This is a relatively unexplored area in robust machine learning, and we encourage
researchers to understand and mitigate the risks arising from unknown perturbations in these contexts.
60
Dense Correspondences between Human Bodies via Learning Transformation Synchronization on GraphsXiangru Huang, Haitao Yang, Etienne Vouga, Qixing Huanghttps://papers.nips.cc/paper/2020/file/ca7be8306ecc3f5fa30ff2c41e64fa7b-Paper.pdfComputing dense correspondences between a partial scan and a template, or between two partial
scans, is a fundamental task for analyzing and understanding 3D data captured from the real world.
Our work is foundational, improving the accuracy and robustness of this important task, and will
benefit downstream applications that rely on the ability to find accurate dense correspondences.

One such application area is human subject tracking, where the correspondences between the partial
scan the the complete template model can be used to deform the template to obtain complete
deformed shape that corresponds to each partial scan. Our research will allow reconstruction of
higher-fidelity animation sequences that better captures nuanced motion from large-scale, real-world
data. Applications that will benefit from this improved tracking include imitation learning, where a
system can learn from motion of each observed subject, especially of fine motor skills not able to be
tracked before; movie/game industry, where one can insert the reconstructed motion of an actor into
virtual environment, with unprecedented expressiveness of the reconstructed actor; and sports, where
one can reconstruct and analyze the athletes’ motions to make recommendations both for improving
athletic performance as well as enhancing athlete safety.

Another application area is full body reconstruction from a few scans. In this setting, the template
mesh serves as an intermediate object to establish dense correspondences between partial scans. Our
research represents an important steps towards allowing ordinary users to scan themselves with high
accuracy at home using commodity hardware. Access to a high-quality digital avatar facilitates many
applications such as virtual fitting for on-line shopping, improved telepresence and telemedicine, and
new forms of entertainment and social media where users can place and animate themselves in a 3D
environment.

Potential abuses and negative impacts of improved tracking and reconstruction include the ability to
identify people without their consent, based on body shape or motion characteristics, in settings where
traditional facial recognition algorithms fail. 3D avatars of a person reconstructed from surreptitious
partial scans might also be used to create “deep fakes” or to otherwise infringe on the privacy rights
of the subject.

From a technical perspective, the problem falls into the category of structure prediction that combines
point-wise predictions and priors on correlations among multiple points. Unlike the standard MRF
formulation, this paper explores a new data representation, which turns structure prediction into a
continuous optimization problem. This methodology can inspire future research on relevant problems,
where the problem space lies in a continuous domain. Moreover, there is growing interest in turning
optimization problems into neural networks with hyper-parameters trained end-to-end. Our approach
contributes to this effort, and we hope the insights we used to design the resulting neural network for
training (including our analysis of robust recovery conditions for the transformation synchronization
problem) can be applied to and stimulate future research on similar problems.

Finally, like any algorithm for computing dense correspondences, our approach is not guaranteed to
generate correct correspondences in all the settings. Additional checks and verification (by humans
using interactive tools, for instance) should be used to validate and rectify the outputs, especially if
the results are used in safety- or health-critical applications such as personalized medicine.
61
Hierarchical Neural Architecture Search for Deep Stereo MatchingXuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Hongdong Li, Tom Drummond, Zongyuan Gehttps://papers.nips.cc/paper/2020/file/fc146be0b230d7e0a92e66a6114b840d-Paper.pdf
The task of stereo matching has been studied for over half a century and numerous methods have
been proposed. From traditional methods to deep learning based methods, people keep setting a
new state-of-the-art through these years. Nowadays, deep learning based methods become more
popular than traditional methods since deep methods are more accurate and faster. However, finding
a better architecture for stereo matching networks remains a hot topic recently. Rather than designing
a handcrafted architecture with trial and error, we propose to allow the network to learn a good
architecture by itself in an end-to-end manner. Our method reduces more than 2/3 of searching time
than previous method [25] and has much better performance, thus saves lots of energy consumption
and good for our planet by reducing massive carbon footprints.

In addition, our proposed search framework is relatively general and not limited to the specific task of
stereo matching. It can be well extended to other dense matching tasks such as optical flow estimation
and multi-view stereo.

62
A graph similarity for deep learningSeongmin Okhttps://papers.nips.cc/paper/2020/file/0004d0b59e19461ff126e3a08a814c33-Paper.pdf
This article mainly discusses two topics: how to measure similarity between graphs, and how to learn
from graphs. One of the most important subjects in both fields is the molecular graph. A chemically
meaningful similarity between molecules helps find new drugs and invent new materials of great
value. Many chemical search engines support similarity search based on fingerprints, which indicate
the existence of certain substructures. The fingerprints have been useful to find molecules of interest,
but they are inherently limited to local properties. The proposed graph similarity is simple, fast
and efficient. The proposed graph neural network reports particular strength in molecular property
prediction and molecular graph generation, albeit not studied extensively. It is possible that the
proposed algorithms provide another, global perspective to molecular similarity.

Another task for which the proposed neural network showed strength is the node classification. The
node classification is mostly used to automatically categorize articles, devices, people, and other
entities in interconnected networks at large scale. Some related examples include identifying false
accounts in social network services, classifying a person for a recommendation system based on its
friends’ interest, and detecting malicious edge-devices in Internet of Things or mobile networks. As
with every machine learning applications, assessing and understanding the data is crucial in such
cases. Especially in graph-structured data, we believe that the characteristic of data is the most
important factor in deciding which graph learning algorithm to use. It is necessary to understand the
principle and limitation of an algorithm to prevent failure. For example, our method has two caveats.
First, it uses sum to collect information from the neighbors, and hence more suitable when the counts
indeed matter and not just the distributions. Second, our method decides the similarity between two
graphs using the local information. Hence when the "global" graph properties such as hamiltonicity,
treewidth, and chromatic number are the deciding factor, our algorithm might not be the best choice.

Graph learning in general are being applied to more and more tasks and applications. Some of the
examples include recommendation systems, transportation analysis, and credit assignments. However,
the study of risks regarding graph learning, such as adversarial attack, privacy protection, ethics and
biases are still at an early stage. In practice, we should be warned about such risks and devise testing
and monitoring framework from the start to avoid undesirable outcomes.

63
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural NetworksWei Hu, Lechao Xiao, Ben Adlam, Jeffrey Penningtonhttps://papers.nips.cc/paper/2020/file/c6dfc6b7c601ac2978357b7a81e2d7ae-Paper.pdf
This work is theoretical and does not present any foreseeable societal consequence.

64
Choice BanditsArpit Agarwal, Nicholas Johnson, Shivani Agarwalhttps://papers.nips.cc/paper/2020/file/d5fcc35c94879a4afad61cacca56192c-Paper.pdf
The purpose of this paper is to understand whether efficient learning is possible in a bandit setting
where one does not receive quantitative feedback for an individual arm but rather relative feedback
in the form of a multiway choice. It is well-known that quantitative judgments of humans can have
biases; our algorithm, which learns from relative multiway choices, can help alleviate these biases.
Moreover, by receiving larger choice sets from our algorithm, humans can have a better sense of the
quality distribution of arms, and can make more informed choices.

5 We also considered the SelfSparring algorithm of [26] and the battling bandit algorithms of [27], which
are applicable to choice models defined in terms of an underlying pairwise comparison model P . However,
these algorithms all return multisets St, and any simple reduction of such multisets to strict sets as considered
in our setting (as well as the setting of [23]) can end up throwing away important information learned by the
algorithms, resulting in a comparison that could be unfair to those algorithms. We did explore such reductions
and our algorithm easily outperformed them, but we chose not to include the results here due to this issue of
fairness. (Moreover, under the MNL model, [23] already established that MaxMinUCB outperforms those
algorithms – presumably under similar reductions – so in the end, we decided such a comparison would provide
little additional value here.)

Another advantage of our setting is that we do not rely on historic data as our data collection is
online. Hence, one does not need to worry about past biases being reflected in the choice datasets.
However, one has to be cautious about the use of our algorithm in applications where arms represent
individuals/entities such as job applicants, property renters etc. In these applications, the choices
of people can be biased against certain individuals/groups, thereby hurting the chances of these
individuals/groups to be selected by our algorithm. Here, depending on the application, one might
need to consider imposing some form of fairness constraints on the choice sets output by our algorithm
in order to prevent any discrimination against such individuals/groups.

65
Deep Imitation Learning for Bimanual Robotic ManipulationFan Xie, Alexander Chowdhury, M. Clara De Paolis Kaluza, Linfeng Zhao, Lawson Wong, Rose Yuhttps://papers.nips.cc/paper/2020/file/18a010d2a9813e91907ce88cd9143fdf-Paper.pdf
Robotics systems that utilize fully automated policies for different tasks have already been applied
to many manufacturing, assembly lines, and warehouses processes. Our work demonstrates the
potential to take this automation one step further. Our algorithm can automatically learn complex
control policies from expert demonstrations, which could potentially allow robots to augment their
existing control designs and further optimize their workflows. Implementing learned policies in
safety-critical environments such as large-scale assembly lines can be risky as these algorithms do not
have guaranteed precision. Improved theoretical understanding and interpretability of model policies
could potentially mitigate these risks.

66
Model Fusion via Optimal TransportSidak Pal Singh, Martin Jaggihttps://papers.nips.cc/paper/2020/file/fb2697869f56484404c8ceee2985b01d-Paper.pdf
Model fusion is a fundamental building block in machine learning, as a way of direct knowledge
transfer between trained neural networks. Beyond theoretical interest it can serve a wide range of
concrete applications. For instance, collaborative learning schemes such as federated learning are
of increasing importance for enabling privacy-preserving training of ML models, as well as a better
alignment of each individual’s data ownership with the resulting utility from jointly trained machine
learning models, especially in applications where data is user-provided and privacy sensitive [29].
Here fusion of several models is a key building block to allow several agents to participate in joint
training and knowledge exchange. We propose that a reliable fusion technique can serve as a step
towards more broadly enabling privacy-preserving and efficient collaborative learning.

67
Minimax Bounds for Generalized Linear ModelsKuan-Yun Lee, Thomas Courtadehttps://papers.nips.cc/paper/2020/file/6a508a60aa3bf9510ea6acb021c94b48-Paper.pdfThe generalized linear model (GLM) is a broad class of statistical models that have extensive
applications in machine learning, electrical engineering, finance, biology, and many areas not stated
here. Many algorithms have been proposed for inference, prediction and classification tasks under
the umbrella of the GLM, such as the Lasso algorithm, the EM algorithm, Dantzig selectors, etc.,
but often it is hard to confidently assess optimality. Lower bounds for minimax and Bayes risks play
a key role here by providing theoretical benchmarks with which one can evaluate the performance
of algorithms. While many previous approaches have focused on the Gaussian linear model, in this
paper we establish minimax and Bayes risk lower bounds that hold uniformly over all statistical
models within the GLM. Our arguments demonstrate a set of information-theoretic techniques that
are general and applicable to setups other than the GLM. As a result, many applications stand to
potentially benefit from our work.
68
Confidence sequences for sampling without replacementIan Waudby-Smith, Aaditya Ramdashttps://papers.nips.cc/paper/2020/file/e96c7de8f6390b1e6c71556e4e0a4959-Paper.pdfThe main type of broader impact caused by our work is the reduction of time, money and energy due
to the ability to continuously monitor data and hence make critical decisions early. In Appendix A,
we provide four prototypical examples of situations where our methods may prove useful. In Exam-
ple A, every single phone call requires time to collect the opinions, thus using up money as well, and
if we can accurately quantify uncertainty then we can stop collecting data sooner. In Example B,
randomization tests such as those involving permutations are a common way to quantify statistical
significance, but they are computationally intensive and thus take up a lot of time. Knowing when
to stop, based on the test being clearly statistically significant (or clearly far from it), can save on
energy costs. In Example D, when an educational intervention is unrolled one school at a time, there
are two possibilities again: if it is clearly beneficial, we would like to recognize it quickly so that
every student can avail of the benefits, while if it is for some reason harmful (e.g. causing stress
without measurable benefit), then it would be equally important to end the program quickly. Once
more, accurately quantifying uncertainty as the process unfolds underpins the ability to make these
decisions early to disseminate benefits rapidly or mitigate harms quickly.
As a side remark, though we have not demonstrated it in this paper, our techniques are also applicable
to auditing elections (checking whether the results are as announced by a manual random recount).
‘Risk-limiting audits’ [22] constitute a full-fledged application area that we intend to pursue in future work; there are many variants depending on how voters express their preferences (choose one, or
rank all, or score all) and the aggregation mechanism used to decide on one or multiple winners.
Audits are not currently required by law in many state/county (or federal) elections due to high
perceived effort among other reasons, so being able to stop these audits early, yet accurately and
confidently, is critical to their broad adoption. In this sense, a longer-term broader impact to trust in
elections is anticipated.

69
Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of DimensionalityNian Si, Jose Blanchet, Soumyadip Ghosh, Mark Squillantehttps://papers.nips.cc/paper/2020/file/f3507289cfdc8c9ae93f4098111a13f9-Paper.pdf
This is a theoretical contribution that, nevertheless, has the potential of impacting a wide range of
application domains in business, engineering and science. In particular, all of those in which the
Wasserstein distance has been extensively used as a statistical inference tool (e.g. image analysis and
computer vision, signal processing, operations research, and so on). Because our paper provides a
step towards breaking the curse of dimensionality in statistical rates of convergence, we believe that
we have the potential of enabling more applications to multiple hypothesis testing (e.g., certifying
Wasserstein GANs). In turn, we plan to improve human resource development by including some of
the main findings in this paper in Ph.D. courses.

70
The Generalized Lasso with Nonlinear Observations and Generative PriorsZhaoqiang Liu, Jonathan Scarletthttps://papers.nips.cc/paper/2020/file/dd45045f8c68db9f54e70c67048d32e8-Paper.pdf
Who may benefit from this research. This is a theory paper primarily targeted at the research
community. The signal recovery techniques studied could potentially be useful for practitioners in
areas such as image processing, audio processing, and medical imaging.

Who may be put at disadvantage from this research. We are not aware of any significant/imminent
risks of placing anyone at a disadvantage.

Consequences of failure of the system. We believe that most failures should be immediately
evident and detectable due to visibly poor reconstruction performance, and any such outputs could be
discarded as needed. However, some more subtle issues could arise, such as the reconstruction missing
important details in the signal due to the generative model not capturing them. As a result, care is
advised in the choice of generative model, particularly in applications for which the reconstruction of
fine details is crucial.

Potential biases. The signal recovery algorithm that we consider takes as input an arbitrary pre-
trained generative model. If such a pre-trained model has inherent biases, they could be transferred to
the signal recovery algorithm.

71
AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep LearningHao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xinghttps://papers.nips.cc/paper/2020/file/0a2298a72858d90d5c4b4fee954b6896-Paper.pdf
The proposed AutoSync alleviates the burden on ML researchers and practitioners in choosing
appropriate synchronization strategy for efficient distributed training, enables substantial speed up of
ML prototyping and training, and reduces the cost of their operational workloads using distributed
computing. Further, AutoSync is transferable to unseen model and cluster settings by the design of
domain-agnostic features. By this, finding a good synchronization strategy for a large-scale ML model
such as BERT [7] and GPT [28] or on a relatively expensive cluster only requires developing runtime
simulators using data collected from a streamlined model on handy clusters, saving substantial
experimental efforts and budgets. We will release and open-source our code and a new dataset to
benefit the research community, to democratize high-performance ML systems, and make them
accessible to non-ML-educated software developers and society at large. Since such needs are
prevalent across many disciplines beyond computing and information science – such as industrial
and manufacturing, healthcare, biology, social science, and finance – our deliverables are expected to
have a catalytic impact.

72
Adaptive Shrinkage Estimation for Streaming GraphsNesreen Ahmed, Nick Duffieldhttps://papers.nips.cc/paper/2020/file/780261c4b9a55cd803080619d0cc3e11-Paper.pdf
There is a burgeoning recent literature of statistical estimation and adaptive data analysis of the
higher-order structural properties of graphs in both the streaming and non streaming context that
reflect the importance and interest of this topic for the graph algorithms and relational learning
research community. On the other hand, shrinkage estimators are an established technique from
more general statistics. This paper is the first to apply shrinkage based methods in the context of
graph approximation. The expected broader impact is as a proof of concept that shows the way for
other researchers in this area to improve estimation quality. Moreover, this work fits under statistical
inference for temporal relational/network data, which would enable statistical analysis and learning
for network data that appear in streaming settings, in particular when exact solutions are not feasible
(similar to the important literature on randomization algorithms for data matrices [1]).
Furthermore, there are many applications where the data has a pronounced temporal, relational, and
spatial structure (e.g., relational data). Examples of Non-IID streams include (i) non-independence
due to temporal clustering in communication graphs on internet, online social networks, physical
contact networks, and social media such as flash crowds and coordinated botnet activity; (ii) non-
identical distributions in activity on these networks due to diurnal and other seasonal variations,
synchronization of user network activity e.g., searches stimulated by hourly news reports. The
proposed framework is suitable for these applications, because it makes no statistical assumptions
concerning the arrival stream and the order of the arriving edges.

73
The Strong Screening Rule for SLOPEJohan Larsson, Malgorzata Bogdan, Jonas Wallinhttps://papers.nips.cc/paper/2020/file/a7d8ae4569120b5bec12e7b6e9648b86-Paper.pdf
The predictor screening rules introduced in this article allow for a substantial improvement of the
speed of SLOPE. This facilitates application of SLOPE to the identification of important predictors in
huge data bases, such as collections of whole genome genotypes in Genome Wide Association Studies.
It also paves the way for the implementation of cross-validation techniques and improved efficiency
of the Adaptive Bayesian version SLOPE (ABSLOPE [39]), which requires multiple iterations of the
SLOPE algorithm. Adaptive SLOPE bridges Bayesian and the frequentist methodology and enables
good predictive models with FDR control in the presence of many hyper-parameters or missing data.
Thus it addresses the problem of false discoveries and lack of replicability in a variety of important
problems, including medical and genetic studies.

In general, the improved efficiency resulting from the predictor screening rules will make the SLOPE
family of models (SLOPE [3], grpSLOPE [6], and ABSLOPE) accessible to a broader audience,
enabling researchers and other parties to fit SLOPE models with improved efficiency. The time
required to apply these models will be reduced and, in some cases, data sets that were otherwise too
large to be analyzed without access to dedicated high-performance computing clusters can be tackled
even with modest computational means.

We can think of no way by which these screening rules may put anyone at disadvantage. The methods
we outline here do not in any way affect the model itself (other than boosting its performance) and
can therefore only be of benefit. For the same reason, we do not believe that the strong rules for
SLOPE introduces any ethical issues, biases, or negative societal consequences. In contrast, it is in
fact possible that the reverse is true given that SLOPE serves as an alternative to, for instance, the
lasso, and has superior model selection properties [10, 39] and lower bias [39].

74
Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial ApproachLuofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Mladen Kolar, Zhaoran Wanghttps://papers.nips.cc/paper/2020/file/65a99bb7a3115fdede20da98b08a370f-Paper.pdfIn recent years, the impact of machine learning (ML) on economics is already well underway
[5, 15], and our work serves as a complement to this line of research. On the one hand, machine
learning methods such as random forest, support vector machines and neural networks provide great
flexibility in modeling, while traditional tools in structural estimation that are well versed in the
econometrics community are still primitive, despite recent advances [32, 26, 7, 21]. On the other
hand, to facilitate ML-base decision making, one must be aware of the distinction between prediction
and causal inference. Our method provides an NN-based solution to estimation of generalized
SEMs, which encompass a wide range of econometric and causal inference models. However, we
remark that in order to apply the method to policy and decision problems, one must pay equal
attention to other aspects of the model, such as interpretability, robustness of the estimates, fairness
and nondiscrimination, assumptions required for model identification, and the testability of those
assumptions. Unthoughtful application of ML methods in an attempt to draw causal conclusions must
be avoided for both ML researchers and economists.
75
Meta-NeighborhoodsSiyuan Shan, Yang Li, Junier B. Olivahttps://papers.nips.cc/paper/2020/file/35464c848f410e55a13bb9d78e7fddd0-Paper.pdfAny general discriminative machine learning model runs the risk of making biased and offensive
predictions reflective of training data. Our work is no exception as it aims at improving discriminative
learning performance. To reduce these negative influences to the minimum possible extent, we only
use standard benchmarks in this work, such as CIFAR-10, Tiny-ImageNet, MNIST, and datasets from
the UCI machine learning repository.

Our work does impose some privacy concerns as we are learning a per-instance adjusted model in
this work. Potential applications of the proposed model include precision medicine, personalized
recommendation systems, and personalized driver assistance systems. To keep user data safe, it is
desirable to only deploy our model locally.

The induced neighbors in our work, which are semantically meaningful, can also be regarded as fake
synthetic data. Like DeepFakes, they may also raise a set of challenging policy, technology, and legal
issues. Legislation regarding synthetic data should take effect and the research community needs to
develop effective methods to detect these synthetic data.
76
When Counterpoint Meets Chinese Folk MelodiesNan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhanghttps://papers.nips.cc/paper/2020/file/bae876e53dab654a3d9d9768b1b7b91a-Paper.pdfThe idea of integrating Western counterpoint into Chinese folk music generation is innovative. It
would make positive broader impacts on three aspects: 1) It would facilitate more opportunities and
challenges of music cultural exchanges at a much larger scale through automatic generation. For
example, the inter-cultural style fused music could be used in Children’s enlightenment education
to stimulate their interest in both cultures. 2) It would further the idea of collaborative counterpoint
improvisation between two parts (e.g., a human and a machine) to music traditions where such
interaction was less common. 3) The computer-generated music may “reshape the musical idiom”[23],
which may bring more opportunities and possibilities to produce creative music.

The proposed work may also have some potential negative societal impacts: 1) Similar to other
computational creativity research, the generated music has the possibility of plagiarism by copying
short snippets from the training corpus, even though copyright infringement is not a concern as
neither folk melodies nor Bach’s music has copyright. That being said, our online music generation
approach conditions music generation on past human and machine generation, and is less likely to
directly copy snippets than offline approaches do. 2) The proposed innovative music generation
approach may cause disruptions to current music professions, even deprive them of their means of
existence[23]. However, it also opens new areas and creates new needs in this we-media era. Overall,
we believe that the positive impacts significantly outweigh the negative impacts.
77
Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithmlin yang, Mohammad Hajiesmaili, Mohammad Sadegh Talebi, John C. S. Lui, Wing Shing Wonghttps://papers.nips.cc/paper/2020/file/e655c7716a4b3ea67f48c6322fc42ed6-Paper.pdfOur work fits within the broad direction of research concerning safety issues in AI/ML at large. With
the recent radical advances in machine learning, ML-assisted decision making is fast becoming an
intrinsic part of the design of systems and services that billions of people around the world use every
day. And not surprisingly, investigating the vulnerability of existing learning models and robustness
against manipulation attacks are becoming critically important in the light of trustworthy learning
paradigm. Hence, there has been a surge of interest in making learning models that are robust against
adversarial attacks for both applied ML such as supervised learning and deep learning, and theoretical
ML such as reinforcement learning and multi-armed bandits. This is critically important for society,
since the ML algorithms are being adopted more and more in safety-critical domains across sciences,
businesses, and governments that impact people’s daily lives. Last, we see no ethical concerns related
to this paper.
78
The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic NoiseIlias Diakonikolas, Daniel M. Kane, Pasin Manurangsihttps://papers.nips.cc/paper/2020/file/ebd64e2bf193fc8c658af2b91952ce8d-Paper.pdf
Our work aims to advance the algorithmic foundations of adversarially robust machine learning. This
subfield focuses on protecting machine learning models (especially their predictions) against small
perturbations of the input data. This broad goal is a pressing challenge in many real-world scenarios,
where successful adversarial example attacks can have far-reaching implications given the adoption
of machine learning in a wide variety of applications, from self-driving cars to banking.

Since the primary focus of our work is theoretical and addresses a simple concept class, we do not
expect our results to have immediate societal impact. Nonetheless, we believe that our findings
provide interesting insights on the algorithmic possibilities and fundamental computational limitations
of adversarially robust learning. We hope that, in the future, these insights could be useful in the
design of practically relevant adversarially robust classifiers in the presence of noisy data.

79
Swapping Autoencoder for Deep Image ManipulationTaesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei Efros, Richard Zhanghttps://papers.nips.cc/paper/2020/file/50905d7b2216bfeccb5b41016357176b-Paper.pdfFrom the sculptor’s chisel to the painter’s brush, tools for creative expression are an important part
of human culture. The advent of digital photography and professional editing tools, such as Adobe
Photoshop, has allowed artists to push creative boundaries. However, the existing tools are typically
too complicated to be useful by the general public. Our work is one of the new generation of visual
content creation methods that aim to democratize the creative process. The goal is to provide intuitive
controls (see Section 4.6) for making a wider range of realistic visual effects available to non-experts.

While the goal of this work is to support artistic and creative applications, the potential misuse of such
technology for purposes of deception – posing generated images as real photographs – is quite concern-
ing. To partially mitigate this concern, we can use the advances in the field of image forensics [16], as a
way of verifying the authenticity of a given image. In particular, Wang et al. [72] recently showed that a
classifier trained to classify between real photographs and synthetic images generated by ProGAN [42],
was able to detect fakes produced by other generators, among them, StyleGAN [43] and Style-
GAN2 [44]. We take a pretrained model of [72] and report the detection rates on several datasets in Ap-
pendix ??. Our swap-generated images can be detected with an average rate greater than 90%, and this in-
dicates that our method shares enough architectural components with previous methods to be detectable.
However, these detection methods do not work at 100%, and performance can degrade as the images are
degraded in the wild (e.g., compressed, rescanned) or via adversarial attacks. Therefore, the problem of
verifying image provenance remains a significant challenge to society that requires multiple layers of
solutions, from technical (such as learning-based detection systems or authenticity certification chains),
to social, such as efforts to increase public awareness of the problem, to regulatory and legislative.
80
Group Contextual Encoding for 3D Point CloudsXu Liu, Chengtao Li, Jian Wang, Jingbo Wang, Boxin Shi, Xiaodong Hehttps://papers.nips.cc/paper/2020/file/9b72e31dac81715466cd580a448cf823-Paper.pdf
Our “Group Contextual Encoding” can be directly applied to the 3D point cloud scene understanding
tasks including 3D object detection, voxel labeling, and segmentation. Our research can also support
downstream research and applications such as autonomous driving, robotics, and AR/MR. We will
investigate the generalizability of our method to other tasks and frameworks, e.g., Graph Convolution
network, 3D sparse CNNs, where the global context plays a crucial role in these tasks.

On the other hand, this technology may also endanger the employment of human servants and drivers
because they may be replaced by autonomous robots and vehicles, which may cause the potential
social problems. This issue should be taken seriously and measures should be taken for preparation.

81
A Simple Language Model for Task-Oriented DialogueEhsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socherhttps://papers.nips.cc/paper/2020/file/e946209592563be0f01c844ab2170f0c-Paper.pdf
This work may have implications for the simplification of conversational agents. In the narrow sense,
this work addresses task-oriented dialogue, but similar results might also hold for open-domain
conversational systems. If so, the improvement of these systems and easier deployment would
amplify both the positive and negative aspects of conversational AI. Positively, conversational agents
might play a role in automating predictable communications, thereby increasing efficiency in areas of
society that currently lose time navigating the multitude of APIs, webpages, and telephonic systems
that are used to achieve goals. Negatively, putting conversational agents at the forefront might
dehumanize communication that can be automated and might lead to frustration where human agents
could provide more efficient solutions – for example, when predicted solutions do not apply. These
consequences are not specific to this work, but should be considered by the field of conversational AI
more broadly.

82
Feature Importance Ranking for Deep LearningMaksymilian Wojtas, Ke Chenhttps://papers.nips.cc/paper/2020/file/36ac8e558ac7690b6f44e2cb5ef93322-Paper.pdfThis research does not involve any issues directly regarding ethical aspects and future societal
consequences. In the future, our approach presented in this paper might be applied in different
domains, e.g., medicine and life science, where ethical aspects and societal consequences might have
to be considered.
83
Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human ReconstructionTong He, John Collomosse, Hailin Jin, Stefano Soattohttps://papers.nips.cc/paper/2020/file/690f44c8c2b7ded579d01abe8fdb6110-Paper.pdfWho may benefit from this research? The VR / AR software developers and 3D graphics designers
may benefit from our research. The proposed technique generates single-view clothed human mesh
reconstructions with improved global topology regularities and local surface details. Our method
can benefit various VR / AR applications that involve reconstructing 3D virtual human avatars for
customized user experience, such as conference systems and role-playing games. Moreover, being
able to efficiently reconstruct 3D meshes from single-view images is useful for graphics rendering
and 3D designs.

Who may be put at disadvantage from this research? In the long run, some entry-level graphics
artists and designers might be affected. Generally speaking, the 3D gaming and graphics design
industries are moving towards automatic content generation techniques. These techniques are not
meant to replace highly skilled human workers, but to help improve their productivity at work.

What are the consequences of failure of the system? Failed human mesh reconstructions might
bring unpleasant user experience. Typical failure cases as well as possible solutions have also been
discussed in the main paper.

Whether the task/method leverages biases in the data? There might be some biases on human
poses and clothes due to long-tail cases. However, our dataset is already 10× larger than the one used
in the competing methods. More importantly, our mesh collection procedures can be easily expanded
to other domain-specific scenarios to obtain more human meshes with different shapes, poses and
clothes to compensate for long-tail cases.
84
The Origins and Prevalence of Texture Bias in Convolutional Neural NetworksKatherine Hermann, Ting Chen, Simon Kornblithhttps://papers.nips.cc/paper/2020/file/db5f9f42a7157abe65bb145000b5871a-Paper.pdf
People who build and interact with tools for computer vision, especially those without extensive
training in machine learning, often have a mental model of computer vision models as similar to
human vision. Our findings contribute to a body of work showing that this view is actually far from
correct, especially for ImageNet, one of the datasets most commonly used to train and evaluate
models. Divergences between human and machine vision of the kind we study could cause users to
make significant errors in anticipating and reasoning about the behavior of computer vision systems.
Our findings contribute to a body of work delineating divergences between human and machine
vision, and suggesting avenues for bringing the two systems closer together. Allowing people from a
wide range of backgrounds to make safe, predictable, and equitable models requires vision systems
to perform at least roughly in accordance with their expectations. Making computer vision models
that share the same inductive biases as humans is an important step towards this goal. At the same
time, we recognize the possible negative consequences of blindly constraining models’ judgments
to agree with people’s: human visual judgments display forms of bias that should be kept out of
computer models. More broadly, we believe that work like ours can have a beneficial impact on the
internal sociology of the machine learning community. By identifying connections to developmental
psychology and neuroscience, we hope to enhance interdisciplinary connections across fields, and to
encourage people with a broader range of training and backgrounds to participate in machine learning
research.
85
Belief-Dependent Macro-Action Discovery in POMDPs using the Value of InformationGenevieve Flaspohler, Nicholas A. Roy, John W. Fisher IIIhttps://papers.nips.cc/paper/2020/file/7f2be1b45d278ac18804b79207a24c53-Paper.pdfDecision-making problems are ubiquitous, arising in applications such as tracking an oil spill using a
marine robot, selecting an effective drug schedule in personalized medicine, or allocating irrigation
resources based on seasonal weather forecasts. In each of these important application areas, system
dynamics are represented by complex and potentially learned models and the decision-making agent
can only observe the state through limited sensors. Many current planning and reinforcement learning
algorithms focus on fully-observable domains and generate learned policies without performance
guarantees. However, uncertainty and formal guarantees must play a role in robust decision-making
for high-stakes domains. VoI macro-action generation contributes to fundamental research in robust
and efficient model-based planning under uncertainty. As with all formal results, however, the
bounds we derive only hold under the assumptions that we describe in the text. When performing
decision-making in high-stakes applications, understanding these conditions, the extent to which they
hold, and how algorithm performance degrades as assumptions are violated is critical.
86
Hierarchical Gaussian Process Priors for Bayesian Neural Network WeightsTheofanis Karaletsos, Thang D. Buihttps://papers.nips.cc/paper/2020/file/c70341de2c112a6b3496aec1f631dddd-Paper.pdfOur work targets studying priors of neural networks with respect to two specific aspects: first, we
aim at obtaining weights which are sharp close to the training data and uncertain away from training
data, in order to calibrate the model’s confidence. This is essential for many applications where
predictions of neural networks are consumed to drive decisions, which may occur a cost. In case our
model produces "I don’t know" predictions as we showed it is capable of in OOD data, ML-systems
can either probe an expert or utilize a fallback plan for decisions. Such cases occur across industrial
applications of algorithmic decision making and impact economics and fairness, but are even more
critical in fields such as healthcare or autonomy where wrong but overconfident predictions may lead
to catastrophic decisions.

The second area of impact centers around the ability of a practitioner to express specific types of
prior knowledge for the functions learned by a neural network via auxiliary kernels. This can help
practitioners utilize neural networks as less of a black box and ultimately may lead to the ability
to train networks with rich weight-based function spaces with little data. These types of network
regularization are application-dependent, but ultimately we hope structures such as the ones we
propose may be able to aid with generalization outside the training data by encoding prior knowledge
into networks, an ability that would potentially help in a variety of real world scenarios where data
paucity exists but prior knowledge can be used to fill the gaps.
87
Online Matrix Completion with Side InformationMark Herbster, Stephen Pasteris, Lisa Tsehttps://papers.nips.cc/paper/2020/file/eb06b9db06012a7a4179b8f3cb5384d3-Paper.pdfIn general this work does not present any foreseeable specific societal consequence in the authors’
joint opinion.

This is foundational research in regret-bounded online learning. As such it is not targeted towards
any particular application area. Although this research may have societal impact for good or for ill in
the future, we cannot foresee the shape and the extent.
88
Certifiably Adversarially Robust Detection of Out-of-Distribution DataJulian Bitterwolf, Alexander Meinke, Matthias Heinhttps://papers.nips.cc/paper/2020/file/b90c46963248e6d7aab1e0f429743ca0-Paper.pdfIn order to use machine learning in safety-critical systems it is required that the machine learning
system correctly flags its uncertainty. As neural networks have been shown to be overconfident far
away from the training data, this work aims at overcoming this issue by not only enforcing low
confidence on out-distribution images but even guaranteeing low confidence in a neighborhood around
it. As a neural network should not flag that it knows when it does not know, this paper contributes to
a safer use of deep learning classifiers.

89
Neural Networks Fail to Learn Periodic Functions and How to Fix ItLiu Ziyin, Tilman Hartwig, Masahito Uedahttps://papers.nips.cc/paper/2020/file/1160453108d3e537255e9f7b931f4e90-Paper.pdfIn the field of deep learning, we hope that this work will attract more attention to the study of how
neural networks extrapolate, since how a neural network extrapolates beyond the region it observes
data determines how a network generalizes. In terms of applications, this work may have broad
practical importance because many processes in nature and in society are periodic in nature. Being
able to model periodic functions can have important impact to many fields, including but not limited
to physics, economics, biology, and medicine.
90
BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled ImagesThu H. Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, Niloy Mitrahttps://papers.nips.cc/paper/2020/file/4b29fa4efe4fb7bc667c7b301b74d52d-Paper.pdfBlockGAN is an image generative model that learns an object-oriented 3D scene representation directly
from unlabelled 2D images. Our approach is a new machine learning technique that makes it possible
to generate unseen images from a noise vector, with unprecedented control over the identity and pose
of multiple independent objects as well as the background. In the long term, our approach could enable
powerful tools for digital artists that facilitate artistic control over realistic procedurally generated
digital content. However, any tool can in principle be abused, for example by adding new, manipulating
or removing existing objects or people from images.

At training time, our network performs a task somewhat akin to scene understanding, as our approach
learns to disentangle between multiple objects and individual object properties (specifically their pose
and identity). At test time, our approach enables sampling new images with control over pose and
identity for each object in the scene, but does not directly take any image input. However, it is possible
to embed images into the latent space of generative models [1]. A highly realistic generative image
model and a good image fit would then make it possible to approximate the input image and, more
importantly, to edit the individual objects in a pictured scene. Similar to existing image editing software,
this enables the creation of image manipulations that could be used for ill-intended misinformation
(fake news), but also for a wide range of creative and other positive applications. We expect the benefits
of positive applications to clearly outweigh the potential downsides of malicious applications.
91
High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian OptimizationQing Feng , Ben Letham, Hongzi Mao, Eytan Bakshyhttps://papers.nips.cc/paper/2020/file/faff959d885ec0ecf70741a846c34d1d-Paper.pdfThe methods introduced in this paper expand the scope of problems to which contextual Bayesian
optimization can be applied, and are especially important for settings where policies are evaluated
with A/B tests. We expect this work to be directly beneficial in this setting, for instance for improving
services at Internet companies as in the ABR example that we described in the paper. We are including
our complete code for all of the models introduced in this paper, so the work will be immediately
useful. As shown in the paper, contextualization improves not only the top-line performance of
policies, but also improves the fairness of policies by improving outcomes specifically for small
populations that do not achieve good performance under an existing non-contextual policy. This work
will directly benefit these currently under-served populations.
92
RNNPool: Efficient Non-linear Pooling for RAM Constrained InferenceOindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma, Prateek Jainhttps://papers.nips.cc/paper/2020/file/ebd9629fc3ae5e9f6611e2ee05a31cef-Paper.pdf
Pros: ML models are compute-intensive and are typically served on power-intensive cloud hardware
with a large resource footprint that adds to the global energy footprint. Our models can help reduce
this footprint by (a) allowing low power edge sensors with small memory to analyze images and
admit only interesting images for cloud inference, and (b) reducing the inference complexity of the
cloud models themselves. Further, edge-first inference enabled by our work can reduce reliance on
networks and also help provide privacy guarantees to end-user. Furthermore, vision models on tiny
edge devices enables accessible technologies, e.g., Seeing AI [33] for people with visual impairment.

Cons: While our intentions are to enable socially valuable use cases, this technology can enable
cheap, low-latency and low-power tracking systems that could enable intrusive surveillance by
malicious actors. Similarly, abuse of technology in certain wearables is also possible.

Again, we emphasize that it depends on the user to see the adaptation to either of these scenarios.

93
Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of DistributionsYi Hao, Alon Orlitskyhttps://papers.nips.cc/paper/2020/file/4dbf29d90d5780cab50897fb955e4373-Paper.pdf
Classical information theory states that an i.i.d. sample contains H(X n ∼ p) = nH(p) information,
which provides little insight for statistical applications. We present a different view by decomposing
the sample information into three parts: the labeling of the profile elements, ordering of them, and
profile entropy. With no bias towards any symbols, the profile entropy rises as a fundamental measure
unifying the concepts of estimation, inference, and compression. We believe this view could help
researchers in information theory, statistical learning theory, and computer science communities
better understand the information composition of i.i.d. samples over discrete domains.

The results established in this work are general and fundamental, and have numerous applications in
privacy, economics, data storage, supervised learning, etc. A potential downside is that the theoretical guarantees of the associated algorithms rely on the assumption correctness, e.g., the domain should
be discrete and the sampling process should be i.i.d. . In other words, it will be better if users can
confirm these assumptions by prior knowledge, experiences, or statistical testing procedures. Taking
a different perspective, we think a potential research direction following this work is to extend these
results to Markovian models, making them more robust to model misspecification.

94
Directional Pruning of Deep Neural NetworksShih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Chenghttps://papers.nips.cc/paper/2020/file/a09e75c5c86a7bf6582d2b4d75aad615-Paper.pdf
Our paper belongs to the cluster of works focusing on efficient and resource-aware deep learning.
There are numerous positive impacts of these works, including the reduction of memory footprint
and computational time, so that deep neural networks can be deployed on devices equipped with less
capable computing units, e.g. the microcontroller units. In addition, we help facilitate on-device deep
learning, which could replace traditional cloud computation and foster the protection of privacy.

Popularization of deep learning, which our research helps facilitate, may result in some negative
societal consequences. For example, the unemployment may increase due to the increased automation
enabled by the deep learning.

95
Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test ExamplesShafi Goldwasser, Adam Tauman Kalai, Yael Kalai, Omar Montasserhttps://papers.nips.cc/paper/2020/file/b6c8cf4c587f2ead0c08955ee6e2502b-Paper.pdf
In adversarial learning, this work can benefit users when adversarial examples are correctly identified.
It can harm users by misidentifying such examples, and the misidentifications of examples as
suspicious could have negative consequences just like misclassifications. This work ideally could
benefit groups who are underrepresented in training data, by abstaining rather than performing
harmful incorrect classification. However, it could also harm such groups: (a) by providing system
designers an alternative to collecting fully representative data if possible; (b) by harmfully abstaining
at different rates for different groups; (c) when those labels would have otherwise been correct but are
instead being withheld; and (d) by identifying them when they would prefer to remain anonymous.

Our experiments on handwriting recognition have few ethical concerns but also have less ecological
validity than real-world experiments on classifying explicit images or medical scans.

A note of caution.
Inequities may be caused by using training data that differs from the test
distribution on which the classifier is used. For instance, in classifying a person’s gender from a
facial image, Buolamwini and Gebru [2018] have demonstrated that commercial classifiers are highly
inaccurate on dark-skinned faces, likely because they were trained on light-skinned faces. In such
cases, it is preferable to collect a more diverse training sample even if it comes at greater expense, or
in some cases to abstain from using machine learning altogether. In such cases, 𝑃 𝑄 learning should
not be used, as an unbalanced distribution of rejections can also be harmful.4

96
Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose trackingAnqi Wu, E. Kelly Buchanan, Matthew Whiteway, Michael Schartner, Guido Meijer, Jean-Paul Noel, Erica Rodriguez, Claire Everett, Amy Norovich, Evan Schaffer, Neeli Mishra, C. Daniel Salzman, Dora Angelaki, Andrés Bendesky, The International Brain Laboratory The International Brain Laboratory, John P. Cunningham, Liam Paninskihttps://papers.nips.cc/paper/2020/file/4379cf00e1a95a97a33dac10ce454ca4-Paper.pdf
We propose a new method for animal behavioral tracking. As highlighted in the introduction and
in [10], recent years have seen a rapid increase in the development of methods for animal pose
estimation, which need to operate in a different regime than methods developed for human pose
estimation. Our work significantly improves the state of the art for animal pose estimation, and thus
advances behavioral analysis for animal research, an essential task for scientific discovery in fields
ranging from neuroscience to ecology. Finally, our work represents a compelling fusion of deep
learning methods with probabilistic graphical model approaches to statistical inference, and we hope
to see more fruitful interactions between these rich topic areas in the future.

97
Adapting to Misspecification in Contextual BanditsDylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmerthttps://papers.nips.cc/paper/2020/file/84c230a5b1bc3495046ef916957c7238-Paper.pdfThis paper concerns contextual bandit algorithms that adapt to unknown model misspecification.
Because of their efficiency and ability to adapt to the amount of misspecification contained with no
prior knowledge, our algorithms are robust, and may be suitable for large-scale practical deployment.
On the other hand, our work is at the level of foundational research, and hence its impact on society
is shaped by the applications that stem from it. We will focus our brief discussion on the applications
mentioned in the introduction.

Health services [43] offer an opportunity for potential positive impact. Contextual bandits can be
used to propose medical interventions that lead to a better health outcomes. However, care must be
taken to ethically implement the explore-exploit tradeoff in this sensitive setting, and more research
is required. Online advertisements [4, 35] and recommendation systems [8] are another well-known
application. While improved, robust algorithms can lead to increased profits here, it is important to
recognize that this may positively impact society as a whole.

Lastly, we mention that predictive algorithms like contextual bandits become more and more powerful
as more information is gathered about users. This provides a clear incentive toward collecting as much
information as possible. We believe that the net benefit of research on contextual bandit outweighs
the harm, but we welcome regulatory efforts to produce a legal framework that steers the usage of
machine learning algorithms, including in contextual bandits, in a direction which is respects of the
privacy rights of users.
98
Autofocused oracles for model-based designClara Fannjiang, Jennifer Listgartenhttps://papers.nips.cc/paper/2020/file/972cda1e62b72640cb7ac702714a115f-Paper.pdfIf adopted more broadly, our work could affect how novel proteins, small molecules, materials, and
other entities are engineered. Because predictive models are imperfect, even with the advances pre-
sented herein, care should be taken by practitioners to verify that any proposed design candidates are
indeed safe and ethical for the intended downstream applications. The machine learning approach we
present facilitates obtaining promising design candidates in a cost-effective manner, but practitioners
must follow up on candidates proposed by our approach with conventional laboratory methods, as
appropriate to the application domain.

99
Universal Domain Adaptation through Self SupervisionKuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenkohttps://papers.nips.cc/paper/2020/file/bb7946e7d85c81a9e69fee1cea4a087c-Paper.pdf
Our work is applicable to training deep neural networks with less supervision via knowledge transfer
from auxiliary datasets. Modern deep networks outperform humans on many datasets given a lot of
annotated data, such as in ImageNet. Our proposed method can help reduce the burden of collecting
large-scale supervised data in many applications where large related datasets are available. IThe
positive impact of our work is to reduce the data gathering effort for data-expensive applications.
This can make the technology more accessible for institutions and individuals that do not have rich
resources.t can also help applications where data is protected by privacy laws and is therefore
difficult to gather, or in sim2real applications where simulated data is easy to create but real data
is difficult to collect. The negative impacts could be to make these systems more accessible to
companies, governments or individuals that attempt to use them for criminal activities such as fraud.
Furthermore, As with all current deep learning systems, ours is susceptible to adversarial attacks and
lack of interpretability. Finally, while we show improved performance relative to state-of-the-art,
negative transfer could still occur, therefore our approach should not be used in mission-critical
applications or to make important decisions without human oversight.
100
Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient TrackingIsidoros Tziotis, Constantine Caramanis, Aryan Mokhtarihttps://papers.nips.cc/paper/2020/file/f1ea154c843f7cf3677db7ce922a2d17-Paper.pdfOver the last couple of years we have witnessed an unprecedented increase in the amount of data
collected and processed in order to tackle real life problems. Advances in numerous data-driven
system such as the Internet of Things, health-care, multi-agent robotics wherein data are scattered
across the agents (e.g., sensors, clouds, robots), and the sheer volume and spatial/temporal disparity
of data render centralized processing and storage infeasible or inefficient. Compared to the typical
parameter-server type distributed system with a fusion center, decentralized optimization has its
unique advantages in preserving data privacy, enhancing network robustness, and improving the
computation efficiency. Furthermore, in many emerging applications such as collaborative filtering,
federated learning, distributed beamforming and dictionary learning, the data is naturally collected
in a decentralized setting, and it is not possible to transfer the distributed data to a central location.
Therefore, decentralized computation has sparked considerable interest in both academia and industry.
At the same time convex formulations for training machine learning tasks have been replaced by
nonconvex representations such as neural networks and a line of significant non convex problems
are on the spotlight. Our paper contributes to this line of work and broadens the set of problems
that can be successfully solved without the presence of a central coordinating authority in the
aforementioned framework. The implications on the privacy of the agents are apparent while rendering
the presence of an authority unnecessary has political and economical extensions. Furthermore,
numerous applications are going to benefit from our result impacting society in many different ways.