NeurIPS Broader Impact Statement Dataset (Public)

	A	B	C	D	E
1	Title	Author(s)	url	Broader Impact Statement

2	Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes	Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi	https://papers.nips.cc/paper/2020/file/1f10c3650a3aa5912dccc5789fd515e8-Paper.pdf	The current paper presents theoretical work without any foreseeable societal consequence. Therefore, the authors believe that the broader impact discussion is not applicable.	An analysis of this dataset is in our paper Unpacking the Expressed Consequences of AI Research in Broader Impact Statements at AIES 2021. These 300 statements are a random sample from NeurIPS 2020 broader impact statements.
3	Practical Low-Rank Communication Compression in Decentralized Deep Learning	Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi	https://papers.nips.cc/paper/2020/file/a376802c0811f1b9088828288eb0d3f0-Paper.pdf	We believe that the ﬁeld of decentralized learning plays a key role in translating the recent successes in deep learning from large organizations with large centralized datasets to smaller industry players and individuals. In particular, decentralized and therefore collaborative training on decentralized data is an important building block towards helping to better align each individual’s data ownership and privacy with the resulting utility from jointly trained machine learning models. The ability to train collaboratively on decentralized data may lead to transformative insights in many ﬁelds, especially in applications where data is user-provided and privacy sensitive (Nedic, 2020). In addition to privacy, efﬁciency gains in distributed training reduce the environmental impact of training large machine learning models. The introduction of a practical and reliable communication compression technique is a small step towards achieving these goals on collaborative privacy-preserving and efﬁcient decentralized learning.
4	Prediction with Corrupted Expert Advice	Idan Amir, Idan Attias, Tomer Koren, Yishay Mansour, Roi Livni	https://papers.nips.cc/paper/2020/file/a512294422de868f8474d22344636f16-Paper.pdf	There are no foreseen ethical or societal consequences for the research presented herein.
5	Organizing recurrent network dynamics by task-computation to enable continual learning	Lea Duncker, Laura Driscoll, Krishna V. Shenoy, Maneesh Sahani, David Sussillo	https://papers.nips.cc/paper/2020/file/a576eafbce762079f7d1f77fca1c5cc2-Paper.pdf	This work proposes a novel continual learning algorithm which will contribute to the advance of related methods. Continual learning of dynamic tasks has not been well-explored in machine learning so far, but will likely be important for ﬁelds such as robotics and developing artiﬁcial intelligent agents more generally. Furthermore, we utilize the framework of recurrent networks to test and reﬁne hypotheses about computation in biological systems. Advances in this area will contribute to the design of new experiments and aid the analyses of recorded data in the ﬁeld of neuroscience.
6	A Catalyst Framework for Minimax Optimization	Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He	https://papers.nips.cc/paper/2020/file/3db54f5573cd617a0112d35dd1e6b1ef-Paper.pdf	Our work provides a family of simple and efﬁcient algorithms for some classes of minimax optimiza- tion. We believe our theoretical results advance many applications in ML which requires minimax optimization. Of particular interests are deep learning and fair machine learning. Deep learning is used in many safety-critical environments, including self-driving car, biometric authentication, and so on. There is growing evidence that shows deep neural networks are vulnerable to adversarial attacks. Since adversarial attacks and defenses are often considered as two-player games, progress in minimax optimization will deﬁnitely empower both. Furthermore, minimax optimization problems provide insights and understanding into the balance and equilibrium between attacks and defenses. As a consequence, making good use of those techniques will boost the robustness of deep learning models and strengthen the security of its applications. Fairness in machine learning has attracted much attention, because it is directly relevant to policy design and social welfare. For example, courts use COMPAS for recidivism prediction. Researchers have shown that bias is introduced into many machine learning systems through skewed data, limited features, etc. One approach to mitigate this is adding constraints into the system, which naturally gives rise to minimax problems.
7	Optimal Learning from Verified Training Data	Nicholas Bishop, Long Tran-Thanh, Enrico Gerding	https://papers.nips.cc/paper/2020/file/6c1e55ec7c43dc51a37472ddcbd756fb-Paper.pdf	The manipulation and fairness of algorithms form a signiﬁcant barrier to practical application of theoretically effective machine learning algorithms in many real world use cases. With this work, we have attempted to address the important problem of data manipulation, which has many societal consequences. Data manipulation is one of many ways in which an individual can “game the system" in order to secure beneﬁcial outcomes for themselves to the detriment of others. Thus, reducing the potential beneﬁts of data manipulation is of worthwhile consideration and focus. Whilst this paper is primarily of theoretical focus, we hope that our work will form a contributing step towards safe, fair, and effective application of machine learning algorithms in more practical settings.
8	Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals	Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie	https://papers.nips.cc/paper/2020/file/27059a11c58ade9b03bde05c2ca7c285-Paper.pdf	Beneﬁts Our conditional chain model addresses the problems where one input sequence is mapped to multiple sequences by taking advantage of the intrinsic interaction between the output sequences. There are a variety of applications that can beneﬁt from the use of the conditional information, such as the text generation tasks. Another important application is the cocktail party problem in speech processing. With the parallel mapping models, which are the dominant method at present, the model cannot handle the variable number of speakers ﬂexibly due to the limitation of the model structure. In such models, the solution to label permutation problems is to exhaustively compute all the permutations with the computation cost of N !, which cannot be neglected when the number of speakers are more than 3. However, using the conditional model can avoid this problem. It also proves the effectiveness of our model which achieves relatively good performance in both separation and recognition tasks. We make a further step towards attacking cocktail party problem. This will improve the communication quality of human-computer interaction. And our method can also be applied in meeting transcription system to provide better performance. We would like to make our code available latter to facilitate the study applied to other tasks. Drawbacks There is no doubt that the improvement of artiﬁcial intelligence can potentially revolutionise our societies in many ways. However, it also bring some risks to human’s privacy. With the abusing use of speech separation and recognition techniques, hackers can easily monitor people’s daily life, while a strong NLP system can also be applied to Internet fraud. We think the community should not only focus the development of techniques, but also concern the privacy issue. Besides, the widely use of artiﬁcial intelligence techniques may also lead to mass-scale unemployment problems, such as call center.
9	Learning Representations from Audio-Visual Spatial Alignment	Pedro Morgado, Yi Li, Nuno Nvasconcelos	https://papers.nips.cc/paper/2020/file/328e5d4c166bb340b314d457a208dc83-Paper.pdf	Self-supervision reduces the need for human labeling, which is in some sense less affected by human biases. However, deep learning systems are trained from data. Thus, even self-supervised models reﬂect the biases in the collection process. To mitigate collection biases, we searched for 360◦videos using queries translated into multiple languages. Despite these efforts, the adoption of 360◦video cameras is likely not equal across different sectors of society, and thus learned representations may still reﬂect such discrepancies.
10	Towards More Practical Adversarial Attacks on Graph Neural Networks	Jiaqi Ma, Shuangrui Ding, Qiaozhu Mei	https://papers.nips.cc/paper/2020/file/32bb90e8976aab5298d5da10fe66f21d-Paper.pdf	For the potential positive impacts, we anticipate that the work may raise the public attention about the security and accountability issues of graph-based machine learning techniques, especially when they are applied to real-world social networks. Even without accessing any information about the model training, the graph structure alone can be exploited to damage a deep learning framework with a rather executable strategy. On the potential negative side, as our work demonstrates that there is a chance to attack existing GNN models effectively without any knowledge but a simple graph structure, this may expose a serious alert to technology companies who maintain the platforms and operate various applications based on the graphs. However, we believe making this security concern transparent can help practitioners detect potential attack in this form and better defend the machine learning driven applications.
11	Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits	Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida	https://papers.nips.cc/paper/2020/file/15bb63b28926cd083b15e3b97567bbea-Paper.pdf	This is a theoretical work and does not present any foreseeable societal consequences.
12	One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL	Saurabh Kumar, Aviral Kumar, Sergey Levine, Chelsea Finn	https://papers.nips.cc/paper/2020/file/5d151d1059a6281335a10732fc49620e-Paper.pdf	Applications and Beneﬁts Our diversity-driven learning approach for improved robustness can be beneﬁcial for bringing RL to real-world applications, such as robotics. It is critical that various types of robots, including service robotics, home robots, and robots used for disaster relief or search-and-rescue are able to handle varying environment conditions. Otherwise, they may fail to complete the tasks they are supposed to accomplish, which could have signiﬁcant consequences in safety-critical situations. It is conceivable that, during deployment of robotics systems, the system may encounter changes in its environment that it has not previously dealt with. For example, a robot may be tasked with picking up a set of objects. At test time, the environment may slightly differ from the training setting, e.g. some objects may be missing or additional objects may be present. These previously unseen conﬁgurations may confuse the agent’s policy and lead to unpredictable and sub-optimal behavior. If RL algorithms are to be used to prescribe actions from input observations in a robotics application, the algorithms must be robust to these perturbations. Our approach of learning multiple diverse solutions to the task is a step towards achieving the desired robustness. Risks and Ethical Issues RL algorithms, in general, face a number of risks. First, they tend to suffer from reward speciﬁcation - in particular, the reward may not necessarily be completely aligned with the desired behavior. Therefore, it can be difﬁcult for a practitioner to predict the behavior of an algorithm when it is deployed. Since our algorithm learns multiple ways to optimize a task reward, the robustness and predictability of its behavior is also limited by the alignment of the reward function with the qualitative task objective. Additionally, even if the reward is well-speciﬁed, RL algorithms face a number of other risks, including (but not limited to) safety and stability. Our diversity-driven learning paradigm suffers from the same issues, as different latent-conditioned policies may not produce reliable behavior when executed in real world settings if the underlying RL algorithm is unstable.
13	Greedy inference with structure-exploiting lazy maps	Michael Brennan, Daniele Bigoni, Olivier Zahm, Alessio Spantini, Youssef Marzouk	https://papers.nips.cc/paper/2020/file/5ef20b89bab8fed38253e98a12f26316-Paper.pdf	Who may beneﬁt from this research? We believe users and developers of approximate inference methods will beneﬁt from our work. Our framework works as an “outer wrapper” that can improve the effectiveness of any ﬂow-based variational inference method by guiding its structure. We hope to make expressive ﬂow-based variational inference more tractable, efﬁcient, and broadly applicable, particularly in high dimensions, by developing automated tests for low-dimensional structure and ﬂexible ways to exploit it. The trace diagnostic developed in our work rigorously assesses the quality of transport/ﬂow-based inference, and may be of independent interest. Who may be put at disadvantage from this research? We don’t believe anyone is put at disad- vantage due to this research. What are the consequences of failure of the system? We speciﬁcally point out that one contribu- tion of this work is identifying when a poor posterior approximation has occurred. A potential failure mode of our framework would be inaccurate estimation of the diagnostic matrix H or its spectrum, suggesting that the approximate posterior is more accurate than it truly is. However, computing the eigenvalues or trace of a symmetric matrix, even one estimated from samples, is a well studied problem. And numerical software guards against poor eigenvalue estimation or at least warns if this occurs. We believe the theoretical underpinnings of this work make it robust to undetected failure. Does the task/method leverage biases in the data? We don’t believe our method leverages data bias. As a method for variational inference, our goal is to accurately approximate a posterior distribution. It is very possible to encode biases for/against a particular result in a Bayesian inference problem, but that occurs at the level of modeling (choosing the prior, deﬁning the likelihood) and collecting data, not at the level of approximating the posterior.
14	Second Order PAC-Bayesian Bounds for the Weighted Majority Vote	Andres Masegosa, Stephan Lorenzen, Christian Igel, Yevgeny Seldin	https://papers.nips.cc/paper/2020/file/386854131f58a556343e056f03626e00-Paper.pdf	Ensemble classiﬁers, in particular random forests, are among the most important tools in machine learning [Fernández-Delgado et al., 2014, Zhu, 2015], which are very frequently applied in practice [e.g., Chen and Guestrin, 2016, Hoch, 2015, Puurula et al., 2014, Stallkamp et al., 2012]. Our study provides generalization guarantees for random forests and a method for tuning the weights of individual trees within a forest, which can lead to even higher accuracies. The result is of high practical relevance. Given that machine learning models are increasingly used to make decisions that have a strong impact on society, industry, and individuals, it is important that we have a good theoretical understanding of the employed methods and are able to provide rigorous guarantees for their performance. And here lies the strongest contribution of the line of research followed in our study, in which we derive rigorous bounds on the generalization error of random forests and other ensemble methods for multiclass classiﬁcation.
15	3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data	Benjamin Biggs, David Novotny, Sebastien Ehrhardt, Hanbyul Joo, Ben Graham, Andrea Vedaldi	https://papers.nips.cc/paper/2020/file/ebf99bb5df6533b6dd9180a59034698d-Paper.pdf	Our method improves the ability of machines to understand human body poses in images and videos. Understanding people automatically may arguably be misused by bad actors. However, importantly, our method is not a form of biometric as it does not allow the identiﬁcation of people. Rather, only their overall body shape and pose is reconstructed, but these details are insufﬁcient for unique identiﬁcation. In particular, individual facial features are not reconstructed at all. Furthermore, our method is an improvement of existing capabilities, but does not introduce a rad- ical new capability in machine learning. Thus our contribution is unlikely to facilitate misuse of technology which is already available to anyone. Finally, any potential negative use of a technology should be balanced against positive uses. Un- derstanding body poses has many legitimate applications in VR and AR, medical, assistance to the elderly, assistance to the visual impaired, autonomous driving, human-machine interactions, image and video categorization, platform integrity, etc.
16	Prophet Attention: Predicting Attention with Future Attention	Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu Sun	https://papers.nips.cc/paper/2020/file/13fe9d84310e77f13a6d184dbf1232f3-Paper.pdf	Our work aims to improve both the captioning and grounding performance of image captioning systems, promoting the real-word application of image captioning, such as visual retrieval, human- robot interaction and visually impaired people assistance. Furthermore, we can also improve the model interpretability and transparency. However, the training of our framework relies on large volume of image-caption pairs, which are not easily obtained in the real-world. Therefore, it requires speciﬁc and appropriate treatment by experienced practitioners.
17	Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement	Miao Zhang, Huiqi Li, Shirui Pan, Xiaojun Chang, Zongyuan Ge, Steven Su	https://papers.nips.cc/paper/2020/file/9a96a2c73c0d477ff2a6da3bf538f4f4-Paper.pdf	Automatic Machine Learning (AutoML) aims to build a better machine learning model in a data- driven and automated manner, compensating for the lack of machine learning experts and lowering the threshold of various areas of machine learning to help all the amateurs to use machine learning without any hassle. These days, many companies, like Google and Facebook, are using AutoML to build machine learning models for handling different businesses automatically. They especially leverage the AutoML to automatically build Deep Neural Networks for solving various tasks, including computer vision, natural language processing, autonomous driving, and so on. AutoML is an up-and-coming tool to take advantage of the extracted data to ﬁnd the solutions automatically. This paper focuses on the Neural Architecture Search (NAS) of AutoML, and it is the ﬁrst attempt to enhance the intelligent exploration of differentiable One-Shot NAS in the latent space. The experimental results demonstrate the importance of introducing uncertainty into neural architecture search, and point out a promising research direction in the NAS community. It is worth notice that NAS is in its infancy, and it is still very challenging to use it to complete automation of a speciﬁc business function like marketing analytics, customer behavior, or other customer analytics.
18	Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality	Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora	https://papers.nips.cc/paper/2020/file/0740bb92e583cd2b88ec7c59f985cb41-Paper.pdf	This does not present any foreseeable societal consequence.
19	Learning Dynamic Belief Graphs to Generalize on Text-Based Games	Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, Will Hamilton	https://papers.nips.cc/paper/2020/file/1fc30b9d4319760b04fab735fbfed9a9-Paper.pdf	Our work’s immediate aim—improved performance on text-based games—might have limited consequences for society; however, taking a broader view of our work and where we’d like to take it forces us to consider several social and ethical concerns. We use text-based games as a proxy to model and study the interaction of machines with the human world, through language. Any system that interacts with the human world impacts it. As mentioned previously, an example of language-mediated, human-machine interaction is online customer service systems. • In these systems, especially in products related to critical needs like healthcare, providing inaccurate information could result in serious harm to users. Likewise, failing to communi- cate clearly, sensibly, or convincingly might also cause harm. It could waste users’ precious time and diminish their trust. • The responses generated by such systems must be inclusive and free of bias. They must not cause harm by the act of communication itself, nor by making decisions that disenfranchise certain user groups. Unfortunately, many data-driven, free-form language generation systems currently exhibit bias and/or produce problematic outputs. • Users’ privacy is also a concern in this setting. Mechanisms must be put in place to protect it. Agents that interact with humans almost invariably train on human data; their function requires that they solicit, store, and act upon sensitive user information (especially in the healthcare scenario envisioned above). Therefore, privacy protections must be implemented throughout the agent development cycle, including data collection, training, and deployment. • Tasks that require human interaction through language are currently performed by people. As a result, advances in language-based agents may eventually displace or disrupt human jobs. This is a clear negative impact. Even more broadly, any systems that generate convincing natural language could be used to spread misinformation. Our work is immediately aimed at improving the performance of RL agents in text-based games, in which agents must understand and act in the world through language. Our hope is that this work, by introducing graph-structured representations, endows language-based agents with greater accuracy and clarity, and the ability to make better decisions. Similarly, we expect that graph-structured representations could be used to constrain agent decisions and outputs, for improved safety. Finally, we believe that structured representations can improve neural agents’ interpretability to researchers and users. This is an important future direction that can contribute to accountability and transparency in AI. As we have outlined, however, this and future work must be undertaken with awareness of its hazards.
20	Demixed shared component analysis of neural population data from multiple brain areas	Yu Takagi, Steven Kennerley, Jun-ichiro Hirayama, Laurence Hunt	https://papers.nips.cc/paper/2020/file/44ece762ae7e41e3a0b1301488907eaa-Paper.pdf	Although several studies have investigated communication between populations of neurons, task- related communication has been ignored. This is of fundamental importance in neuroscience, and we show that it can be achieved simply by extending the previous method. We believe our methods will be beneﬁcial to the neuroscientists who will investigate interaction among multiple brain areas in terms of speciﬁc task parameter of interest.
21	Neural FFTs for Universal Texture Image Synthesis	Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, Bryan Catanzaro	https://papers.nips.cc/paper/2020/file/a23156abfd4a114c35b930b836064e8b-Paper.pdf	Our AI research offers a powerful tool to synthesize a diverse range of textures with high ﬁdelity and in a real-time manner. Our unique perspective of combining FFT, from signal processing tools, with deep learning for hallucinating images can be a great asset for other generation and style transfer tasks in graphics and vision. From the application standpoint, several applications in graphics and vision directly beneﬁt from our tools to replace their tedious and manual synthesis platforms. In particular, it helps rapidly create natural scenes for computer game developers, interior designers, and artists. In addition, our AI-based tool can discover the generation process behind the real-world scenes, which can help the professionals to better prototype ideas and create new textures. In order to increase the positive impacts and reduce the downsides, we encourage further work to bring the users in the AI loop for additional guidance. This can allow artists to freely incorporate their creativity into the synthesis pipeline. We also recommend the researchers and industries to investigate methods for further squeezing the CNN architecture, and efﬁciently implement them on the processing hardware. This would help not only make our tools faster for edge computing applications, but also reduce the high computational power consumed for training neural networks, that positively impacts the environment.
22	Improving robustness against common corruptions by covariate shift adaptation	Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, Matthias Bethge	https://papers.nips.cc/paper/2020/file/85690f81aadc1749175c187784afc9ee-Paper.pdf	The primary goal of this paper is to increase the robustness of machine vision models against common corruptions and to spur further progress in this area. Increasing the robustness of machine vision systems can enhance their reliability and safety, which can potentially contribute to a large range of use cases including autonomous driving, manufacturing automation, surveillance systems, health care and others. Each of these uses may have a broad range of societal implications: autonomous driving can increase mobility of the elderly and enhance safety, but could also enable more autonomous weapon systems. Manufacturing automation can increase resource efﬁciency and reduce costs for goods, but may also increase societal tension through job losses or increase consumption and thus waste. Of particular concern (besides surveillance) is the use of generative vision models for spreading misinformation or for creating an information environment of uncertainty and mistrust. We encourage further work to understand the limitations of machine vision models in out-of- distribution generalization settings. More robust models carry the potential risk of automation bias, i.e., an undue trust in vision models. However, even if models are robust to common corruptions, they might still quickly fail on slightly different perturbations like surface reﬂections. Understanding under what conditions model decisions can be deemed reliable or not is still an open research question that deserves further attention.
23	The Smoothed Possibility of Social Choice	Lirong Xia	https://papers.nips.cc/paper/2020/file/7e05d6f828574fbc975a896b25bb011e-Paper.pdf	In this paper we aim to provide smoothed possibilities of social choice, which is an important problem in the society. Therefore, success of the research will benefit general public beyond the CS research community because better solutions are now available for a wide range of group decisionmaking scenarios.
24	End-to-End Learning and Intervention in Games	Jiayang Li, Jing Yu, Yu Nie, Zhaoran Wang	https://papers.nips.cc/paper/2020/file/c21f4ce780c5c9d774f79841b81fdc6d-Paper.pdf	Our work helps understand and resolve social dilemmas resulting from pervasive conﬂict between self- and collective interest in human societies. The potential applications of the proposed modeling framework range from addressing externality in economic systems to guiding large-scale infrastructure investment. Planners, regulators, policy makers of various human systems could beneﬁt from the decision making tools derived from this work.
25	Self-Supervised Graph Transformer on Large-Scale Molecular Data	Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying WEI, Wenbing Huang, Junzhou Huang	https://papers.nips.cc/paper/2020/file/94aef38441efa3380a3bed3faf1f9d5d-Paper.pdf	In this paper, we have developed a self-supervised pre-trained GNN model—GROVER to extract the useful implicit information from massive unlabelled molecules and the downstream tasks can largely beneﬁt from this pre-trained GNN models. Below is the broader impact of our research: - For machine learning community: This work demonstrates the success of pre-training approach on Graph Neural Networks. It is expected that our research will open up a new venue on an in-depth exploration of pre-trained GNNs for broader potential applications, such as social networks and knowledge graphs. - For the drug discovery community: Researchers from drug discovery can beneﬁt from GROVER from two aspects. First, GROVER has encoded rich structural information of molecules through the designing of self-supervision tasks. It can also produce feature vectors of atoms and molecule ﬁngerprints, which can directly serve as inputs of downstream tasks. Second, GROVER is designed based on Graph Neural Networks and all the parameters are fully differentiable. So it is easy to ﬁne-tune GROVER in conjunction with speciﬁc drug discovery tasks, in order to achieve better performance. We hope that GROVER can help with boosting the performance of various drug discovery applications, such as molecular property prediction and virtual screening.
26	Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding	Victor Veitch, Anisha Zaveri	https://papers.nips.cc/paper/2020/file/7d265aa7147bd3913fb84c7963a209d1-Paper.pdf	This paper addressed sensitivity analysis for causal inference. We have extended Imbens’ approach to allow the use of arbitrary machine-learning methods for the data modeling. Austen plots provide an entirely post-hoc and blackbox manner of conducting sensitivity analysis. In particular, they make it substantially simpler to perform sensitivity analysis. This is because the initial analysis can be performed without have a sensitivty analysis already in mind, and because producing the sensitivity plots only requires predictions from models that the practitioner has fit anyways. The ideal positive consequence is that routine use of Austen plots will improve the credibility of machine-learning based causal inferences from observational data. Austen plots allow us to both use state-of-the-art models for the observed part of the data, and to reason coherently about the causal effects of potential unobserved confounders. The availability of such a tool may speed the adoption of machine-learning based causal inference for important real-world applications (where, so far, adoption has been slow). On the negative side, an accelerated adoption of machine-learning methods into causal practice may be undesirable. This is simply because the standards of evidence and evaluation used in common machine-learning practice do not fully reflect the needs of causal practice. Austen plots partially bridge this gap, but they just one of the elements required to establish credibility
27	On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems	Kaiqing Zhang, Bin Hu, Tamer Basar	https://papers.nips.cc/paper/2020/file/fb2e203234df6dee15934e448ee88971-Paper.pdf	We believe that researchers of reinforcement learning (RL), especially those who are interested in the theoretical foundations of robust RL, would beneﬁt from this work, through the new insights and angles we have provided regarding robust adversarial RL (RARL) in linear quadratic (LQ) setups, from a rigorous robust control perspective. In particular, considering the impact of RARL [2] in RL with prominent empirical performance, and the ubiquity and fundamentality of LQ setups in continuous control, our results help pave the way for applying the RARL idea in control tasks. More importantly, building upon the concepts from robust control, we have laid emphasis on the robust stability of RARL algorithms when applied to control systems, which has been overlooked in the RL literature, and is signiﬁcant in continuous control, as a destabilized system can lead to catastrophic consequences. Such emphasis may encourage the development of more robust, and more importantly, safe on-the-ﬂy, RARL algorithms, and push forward the development of RL for safety-critical systems as a whole. It also opens up the possibility to integrate more tools from the classic (robust) control theory, to improve the stability and robustness of popular RL algorithms practically used. We do not believe that our research will cause any ethical issue, or put anyone at any disadvantage.
28	Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time	Jerry Li, Guanghao Ye	https://papers.nips.cc/paper/2020/file/9529fbba677729d3206b3b9073d1e9ca-Paper.pdf	Moving forward, it is imperative that machine learning systems cannot be gamed by malicious entities. This work builds upon a growing literature of principled algorithms for robust statistics, which are methods for defending against data poisoning attacks, where a training set may be tampered with by an adversary who wishes to change the behavior of the algorithm. For instance, such defenses are important in where the training data is crowdsourced, such as in federated learning, where we cannot fully trust the training data. In such settings, if the defense fails, attackers can completely invalidate the output of the model. That is why we believe it is critical to develop principled defenses, with provable worst-case guarantees, as we do here. With such defenses, we know that this worst-case behavior cannot happen. The algorithms developed here are also useful for exploratory data analysis, as demonstrated in [DKK+17]. Most real-world high-dimensional datasets are inherently very noisy, and this noise can disguise interesting patterns from data analysts. These methods can be used in exploratory data analysis to remove this noise, and to recover these phenomena. We do not believe that this method leverages any biases in the data. Our generative model, as stated in the introduction, is very simple, and does not introduce any biases in this problem.
29	PLLay: Efficient Topological Layer based on Persistent Landscapes	Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Kim, Frederic Chazal, Larry Wasserman	https://papers.nips.cc/paper/2020/file/b803a9254688e259cde2ec0361c8abe4-Paper.pdf	This paper proposes a novel method of adapting tools in applied mathematics to enhance the learn- ability of deep learning models. Even though our methodology is generally applicable to any complex modern data, it is not tuned to a speciﬁc application that might improperly incur direct societal/ethical consequences. So the broader impact discussion is not needed for our work.
30	Robust Meta-learning for Mixed Linear Regression with Small Batches	Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh	https://papers.nips.cc/paper/2020/file/3214a6d842cc69597f9edf26df552e43-Paper.pdf	One of the main contribution of this paper is to protect meta-learning approaches against data poisoning attacks. Such robustness encourages participation from data contributors, as they can collaborate without necessarily trusting the other data contributors. This facilitates participation of minor contributors who suffer from data scarcity. This fosters democratization of machine learning by allowing minor contributors to enjoy the beneﬁt of big data through collaboration. Such ecosystem will also encourage data sharing, thus improving transparency. The adaptive guarantee we provide in Theorem 1 is fair, in the sense that a group that provides low noise data will receive a model with better accuracy. However, one potential risk in fairness is that meta-learning might result in varying accuracy across the groups. This can be problematic as an under-represented group in training data could suffer from inaccurate prediction for that population. This is an active area of research in the fairness community, but there is no strong experimental evidence that this can be mitigated with algorithmic innovations that do not involve collecting more data from the under-represented population. Another concern in meta-learning with data sharing is privacy. Without proper system to regulate the usage of shared data, sensitive information could be leaked or protected features could be inferred. One silver lining is that robust methods are naturally private, as the trained model is by deﬁnition not sensitive to any one particular data point. On the other hand, if the system relies on the participation of various individuals, then either a technological solution needs to be implemented with cryptographic or privacy preserving primitives, or a proper regulation must be enforced.
31	Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects	Zijun Gao, Yanjun Han	https://papers.nips.cc/paper/2020/file/f75b757d3459c3e93e98ddab7b903938-Paper.pdf	This work mainly provides theoretical tools and bounds for the HTE estimation in causal inference, as well as potentially useful practical insights such as the two-stage nearest-neighbors and throwing away observations with poor covariate matching quality. This special form of nonparametric estimation problems could be a useful addition to the literature in nonparametric statistics, and theorists and practitioners working on causal inference may potentially benefit from this work.
32	Provably Robust Metric Learning	Lu Wang, Xuanqing Liu, Jinfeng Yi, Yuan Jiang, Cho-Jui Hsieh	https://papers.nips.cc/paper/2020/file/e038453073d221a4f32d0bab94ca7cee-Paper.pdf	In this work, we study the problem of adversarial robustness of metric learning. Adversarial robustness, especially robustness verification, is very important when deploying machine learning models into real-world systems. A potential risk is the research on adversarial attack, while understanding adversarial attack is a necessary step towards developing provably robust models. In general, this work does not involve specific applications and ethical issues.
33	Preference learning along multiple criteria: A game-theoretic perspective	Kush Bhatia, Ashwin Pananjady, Peter Bartlett, Anca Dragan, Martin J. Wainwright	https://papers.nips.cc/paper/2020/file/52f4691a4de70b3c441bca6c546979d9-Paper.pdf	An important step towards deploying AI systems in the real world involves aligning their objectives with human values. Examples of such objectives include safety for autonomous vehicles, fairness for recommender systems, and effectiveness of assistive medical devices. Our paper takes a step towards accomplishing this goal by providing a framework to aggregate human preferences along such subjective criteria, which are often hard to encode mathematically. While our framework is quite expressive and allows for non-linear aggregation across criteria, it leaves the choice of the target set in the hands of the designer. As a possible negative consequence, getting this choice wrong could lead to incorrect inferences and unexpected behavior in the real world.
34	Minibatch Stochastic Approximate Proximal Point Methods	Hilal Asi, Karan Chadha, Gary Cheng, John C. Duchi	https://papers.nips.cc/paper/2020/file/fa2246fa0fdf0d3e270c86767b77ba1b-Paper.pdf	Data centers draw increasing amounts of the total energy we consume, and increasing applications of machine learning mean that model-ﬁtting and parameter exploration require a larger and larger proportion of their energy expenditures [1, 14, 30]. Indeed, as Asi and Duchi [1] note, the energy to train and tune some models is roughly on the scale of driving thousands of cars from San Francisco to Los Angeles, while training a modern transformer network (with architecture search) generates roughly six times the total CO2 of an average car’s lifetime [30]. It is thus centrally important to build more efﬁcient and robust methods, which allow us to avoid wasteful hyperparameter search but simply work. A major challenge in building better algorithms is that fundamental physical limits have forced CPU speed and energy to essentially plateau; only by parallelization can we harness both increasing speed and reduce the energy to ﬁt models [14]. In this context, our methods take a step toward reducing the energy and overhead to perform machine learning. Taking a step farther back, we believe optimization and model-ﬁtting research in machine learning should refocus its attention: rather than developing algorithms that, with appropriate hyperparameter tuning, achieve state-of-the-art accuracy for a given dataset, we should evaluate algorithms by whether they robustly work. This would allow a more careful consideration of an algorithms’ costs and beneﬁts: is it worth 2× faster training, for appropriate hyperparameters, if one has to spend 25× as much time to ﬁnd the appropriate algorithmic hyperparameters? Even more, as Strubell et al. [30] point out, the extraordinary costs of hyperparameter tuning for ﬁtting large-scale models price many researchers out of making progress on certain frontiers; to the extent that we can mitigate these challenges, we will allow more equity in who can help machine learning progress.
35	Fine-Grained Dynamic Head for Object Detection	Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng	https://papers.nips.cc/paper/2020/file/7f6caf1f0ba788cd7953d817724c2b6e-Paper.pdf	Object detection is a fundamental task in the computer vision domain, which has already been applied to a wide range of practical applications. For instance, face recognition, robotics and autonomous driving heavily rely on object detection. Our method provides a new dimension for object detection by utilizing the ﬁne-grained dynamic routing mechanism to improve performance and maintain low computational cost. Compared with hand-crafted or searched methods, ours does not need much time for manual design or machine search. Besides, the design philosophy of our ﬁne-grained dynamic head could be further extended to many other computer vision tasks, e.g., segmentation and video analysis.
36	A Decentralized Parallel Algorithm for Training Generative Adversarial Nets	Mingrui Liu, Wei Zhang, Youssef Mroueh, Xiaodong Cui, Jarret Ross, Tianbao Yang, Payel Das	https://papers.nips.cc/paper/2020/file/7e0a0209b929d097bd3e8ef30567a5c1-Paper.pdf	In this paper, researchers introduce a decentralized parallel algorithm for training Generative Adver- sarial Nets (GANs). The proposed scheme can be proved to have a non-asymptotic convergence to ﬁrst-order stationary points in theory, and outperforms centralized counterpart in practice. Our proposed decentralized algorithm is a class of foundational research, since the algorithm design and analysis are proposed for a general class of nonconvex-nonconcave min-max problems and not necessarily restricted for training GANs. Both the algorithm design and the proof techniques are novel, and it may inspire future research along this direction. Our decentralized algorithm has broader impacts in a variety of machine learning tasks beyond GAN training. For example, our algorithm is promising in other machine learning problems whose objective function has a min-max structure, such as adversarial training [92], robust machine learning [93], etc. Our decentralized algorithm can be applied in several real-world applications such as image-to-image generation [94], text-to-image generation [95], face aging [96], photo inpainting [97], dialogue systems [98], etc. In all these applications, GAN training is an indispensable backbone. Training GANs in these applications usually requires to leverage centralized large batch distributed training which could suffer from inefﬁciency in terms of run-time, and our algorithm is able to address this issue by drastically reducing the running time in the whole training process. These real-world applications have a broad societal implications. First, it can greatly help people’s daily life. For example, many companies provide online service, where an AI chatbot is usually utilized to answer customer’s questions. However, the existing chatbot may not be able to fully understand customer’s question and its response is usually not good enough. One can adopt our decentralized algorithms to efﬁciently train a generative adversarial network based on the human-to- human chatting history, and the learned model is expected to answer customer’s questions in a better manner. This system can help customers and signiﬁcantly enhance users’ satisfaction. Second, it can help protect users’ privacy. One beneﬁt of decentralized algorithms is that it does not need the central node to collect all users’ information and every node only communicates with its trusted neighbors. In this case, our proposed decentralized algorithms naturally preserve users’ privacy. We encourage researchers to further investigate the merits and shortcomings of our proposed approach. In particular, we recommend researchers to design new algorithms for training GANs with faster convergence guarantees.
37	Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis	Kavi Gupta, Peter Ebert Christensen, Xinyun Chen, Dawn Song	https://papers.nips.cc/paper/2020/file/cd0f74b5955dc87fd0605745c4b49ee8-Paper.pdf	Program synthesis has many potential real-world applications. One significant challenge of program synthesis is that the generated program needs to be precisely correct. SED mitigates this challenge by not requiring the solution to be generated in one shot, and instead allowing partial solutions to be corrected via an iterative improvement process, achieving an overall improvement in performance as a result. We thus believe SED-like frameworks could be applicable for a broad range of program synthesis tasks.
38	Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization	Chong You, Zhihui Zhu, Qing Qu, Yi Ma	https://papers.nips.cc/paper/2020/file/cd42c963390a9cd025d007dacfa99351-Paper.pdf	Robust learning of structured signals from high-dimensional data has a wide range of applications, including imaging processing, computer vision, recommender systems, generative models and many more. In this work, we presented a new type of practical methods and provided improved under- standings of solving these problems via over-parameterized models. In particular, our method ex- ploits the implicit bias introduced by the learning algorithm, with the underlying driving force being the intrinsic structure of the data itself rather than human handcrafting. Such a design methodology helps to eliminate human bias in the design process, hence provides the basis for developing truly fair machine learning systems.
39	Structured Prediction for Conditional Meta-Learning	Ruohan Wang, Yiannis Demiris, Carlo Ciliberto	https://papers.nips.cc/paper/2020/file/1b69ebedb522700034547abc5652ffac-Paper.pdf	Meta-learning aims to construct learning models capable of learning from experiences, Its intended users are thus primarily non-experts who require automated machine learning services, which may occur in a wide range of potential applications such as recommender systems and autoML. The authors do not expect the work to address or introduce any societal or ethical issues.
40	Training Stronger Baselines for Learning to Optimize	Tianlong Chen, Weiyi Zhang, Zhou Jingyang, Shiyu Chang, Sijia Liu, Lisa Amini, Zhangyang Wang	https://papers.nips.cc/paper/2020/file/51f4efbfb3e18f4ea053c4d3d282c4e2-Paper.pdf	This work mainly contributes to AutoML in the aspect of discovering better learning rules or optimization algorithms from data. As a fundamental technique, it seems to pose no substantial societal risk. This paper proposes several improved training techniques to tackle the dilemma of training instability and poor generalization in learned optimizers. In general, learning to optimize (L2O) prevents laborious problem-specific optimizer design, and potentially can largely reduce the cost (including time, energy and expense) of model training or tuning hyperparameters.
41	MeshSDF: Differentiable Iso-Surface Extraction	Edoardo Remelli, Artem Lukoyanov, Stephan Richter, Benoit Guillard, Timur Bagautdinov, Pierre Baque, Pascal Fua	https://papers.nips.cc/paper/2020/file/fe40fb944ee700392ed51bfe84dd4e3d-Paper.pdf	Computational Fluid Dynamics is key to addressing the critical engineering problem of designing shapes that maximize aerodynamic, hydrodynamic, and heat transfer performance, and much else beside. The techniques we propose therefore have the potential to have a major impact in the ﬁeld of Computer Assisted Design by unleashing the full power of deep learning in an area where it is not yet fully established.
42	The Cone of Silence: Speech Separation by Localization	Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman	https://papers.nips.cc/paper/2020/file/f056bfa71038e04a2400266027c169f9-Paper.pdf	We believe that our method has the potential to help people hear better in a variety of everyday scenarios. This work could be integrated with headphones, hearing aids, smart home devices, or laptops, to facilitate source separation and localization. Our localization output also provides a more privacy-friendly alternative to camera based detection for applications like robotics or optical tracking. We note that improved ability to separate speakers in noisy environments comes with potential privacy concerns. For example, this method could be used to better hear a conversation at a nearby table in a restaurant. Tracking speakers with microphone input also presents a similar range of privacy concerns as camera based tracking and recognition in everyday environments.
43	Debiased Contrastive Learning	Ching-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba, Stefanie Jegelka	https://papers.nips.cc/paper/2020/file/63c3ddcc7b23daa1e42dc41f9a44a873-Paper.pdf	Unsupervised representation learning can improve learning when only small amounts of labeled data are available. This is the case in many applications of societal interest, such as medical data analysis [5, 31], the sciences [22], or drug discovery and repurposing [38]. Improving representation learning, as we do here, can potentially beneﬁt all these applications. However, biases in the data can naturally lead to biases in the learned representation [29]. These biases can, for example, lead to worse performance for smaller classes or groups. For instance, the majority groups are sampled more frequently than the minority ones [16]. In this respect, our method may suffer from similar biases as standard contrastive learning, and it is an interesting avenue of future research to thoroughly test and evaluate this.
44	Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors	Karl Pertsch, Oleh Rybkin, Frederik Ebert, Shenghao Zhou, Dinesh Jayaraman, Chelsea Finn, Sergey Levine	https://papers.nips.cc/paper/2020/file/c8d3a760ebab631565f8509d84b3b3f1-Paper.pdf	We proposed a method for visual prediction and planning that is able to solve long-horizon tasks autonomously. This method may have a broader impact on capabilities of robots performing tasks such as autonomous navigation or object manipulation, and may be applicable in settings such as navigation of zones dangerous for humans, search and rescue, as well as warehouse robotics applications. While the method, and in general all planning and reinforcement learning methods, may be applied to a variety of settings, including those with questionable ethical motivation, we are optimistic of the general positive impact of future autonomous robotic systems, especially in the areas described above. Another ethical consideration is that, since the model is able to produce long videos targeted to a particular goal, it might be used to produce fake videos of people performing a certain action, and provides a degree of control about that action through the speciﬁcation of the goal image. This might enable forging fake videos targeted at speciﬁc persons. However, recent research has shown that most current methods for generating fake videos are easily detectable, both by people and automatic detection methods [18, 1, 38].
45	Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation	Aaron Sonabend, Junwei Lu, Leo Anthony Celi, Tianxi Cai, Peter Szolovits	https://papers.nips.cc/paper/2020/file/daf642455364613e2120c636b5a1f9c7-Paper.pdf	We believe ESRL is a tool that can help bring RL closer to real-world applications. In particular this will be useful in the clinical setting to ﬁnd optimal dynamic treatment regimes for complex diseases, or at least assist in treatment decision making. This is because ESRL’s framework lends itself to be questioned by users (physicians) and sheds light into potential biases introduced by the data sampling mechanism used to generate the observed data set. Additionally, using hypothesis testing and accommodating different levels of risk aversion makes the method sensible to ofﬂine settings and different real-world applications. It is important when using ESRL and any RL method, to question the validity of the policy’s decisions, the quality of the data, and the method that was used to derive these.
46	A polynomial-time algorithm for learning nonparametric causal graphs	Ming Gao, Yi Ding, Bryon Aragam	https://papers.nips.cc/paper/2020/file/85c9f9efab89cee90a95cb98f15feacd-Paper.pdf	Causality and interpretability are crucial aspects of modern machine learning systems. Graphical models in particular are a promising tool at the intersection of causality and interpretability, and our work provides an intuitive approach to balance these issues against modeling ﬂexibility with nonparametric models. That being said, as this work is primarily theoretical, the broader impacts and ethical implications of our work are most likely to be felt downstream in applications. For example, while DAGs can provide causal insights under certain assumptions, these models can potentially be used to provide a false sense of security when they are not applied and deployed carefully. Along these lines, our work attempts to provide a rigourous sense of when ﬂexible nonparametric causal models can be learned from data, by developing both theory and algorithms to justify these models from both mathematical and empirical perspectives.
47	A Unifying View of Optimism in Episodic Reinforcement Learning	Gergely Neu, Ciara Pike-Burke	https://papers.nips.cc/paper/2020/file/0f0e13216262f4a201bec128044dd30f-Paper.pdf	The results presented in this paper are largely theoretical. We deﬁne a class of algorithms which are theoretically well understood, but also beneﬁt from a computationally efﬁcient implementation. The framework provided in this paper is very general so, in principle, any algorithm which ﬁts into the framework could be applied to any reinforcement learning problem in a tabular or factored linear MDP. Consequently, as for any reinforcement learning algorithm, there is the potential for algorithms developed using the ideas presented in this paper to be applied in settings which have negative societal impacts, or in settings where the reward function is not well speciﬁed leading to undesirable behaviors.
48	Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation	Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang	https://papers.nips.cc/paper/2020/file/8d2355364e9a2ba1f82f975414937b43-Paper.pdf	As reinforcement learning becomes increasingly popular in practice and the problem dimension grows, there is a soaring demand for data-efﬁcient learning algorithms. Through the lens of low-rank representation of so-called Q-function, this work proposes a theoretical framework to devise efﬁcient RL algorithms. The resulting “low-rank” algorithm, which utilizes a novel matrix estimation method, offers both strong theoretical guarantees and appealing empirical performance. In particular, the novel “low-rank” perspective about RL provides an effective tool to tackle RL problems with both state and action spaces continuous, which have received much less attention despite their practical signiﬁcance. We believe that this work serves as an important step towards provable efﬁcient RL for continuous problems. The theoretical insights in this work can motivate further research in both efﬁcient RL and ME, while the empirical results should be beneﬁcial more broadly for practitioners working in continuous controls.
49	FleXOR: Trainable Fractional Quantization	Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun	https://papers.nips.cc/paper/2020/file/0e230b1a582d76526b7ad7fc62ae937d-Paper.pdf	Due to rapid advances in developing neural networks of higher model accuracy and increasingly complicated tasks to be supported, the size of DNNs is becoming exponentially larger. Our work facilitates the deployment of large DNN applications in various forms including mobile devices because of the powerful model compression ratio. As for positive perspectives, hence, a huge amount of energy consumption to run model inferences can be saved by our proposed quantization and encryption techniques. Also, a lot of computing systems that are based on binary neural network forms can improve model accuracy. We expect that lots of useful DNN models would be available for devices of low cost. On the other hand, some common concerns on DNNs such as privacy breaching and heavy surveillance can be worsened by DNN devices that are more available economically by using our proposed techniques.
50	Efficient active learning of sparse halfspaces with arbitrary bounded noise	Chicheng Zhang, Jie Shen, Pranjal Awasthi	https://papers.nips.cc/paper/2020/file/5034a5d62f91942d2a7aeaf527dfe111-Paper.pdf	This paper investigates a fundamental problem in machine learning and statistics. The theory and algorithms presented in this paper are expected to beneﬁt many broad ﬁelds in science and engineering, such as learning theory, robust statistics, optimization, and applications in biology, climatology, and seismology, to name a few. Our research belongs to the general paradigm of interactive learning, in which the learning agent need to design adaptive sampling schemes to maximize data efﬁciency. We are well aware that one needs to be careful in designing such sampling schemes, to avoid unintended harms such as discrimination.
51	Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases	Senthil Purushwalkam Shiva Prakash, Abhinav Gupta	https://papers.nips.cc/paper/2020/file/22f791da07b0d8a2504c2537c560001c-Paper.pdf	The goal of this work is to analyze existing self-supervised learning methods through diagnostic experiments. Analysis and understanding of existing approaches help develop better interpretation of ML algorithms and can be crucial in removing biases. Upon identifying the shortcomings of existing approaches, we propose a modiﬁcation to improve the representations learned by these approaches. Self-supervised learning involves learning representations from a large collection of unlabeled data. Since there is no human involvement in the data collection pipeline, we anticipate reduction in biases that can come via human labeling. Furthermore, self-supervised learning is a relatively nascent research topic with minimal deployability in the real-world. Therefore, while in the long run visual self-supervised learning would be impactful, at this moment there is no immediate impact.
52	Ratio Trace Formulation of Wasserstein Discriminant Analysis	Hexuan Liu, Yunfeng Cai, You-Lin Chen, Ping Li	https://papers.nips.cc/paper/2020/file/c37f9e1283cbd4a6edfd778fc8b1c652-Paper.pdf	In the era of big data, business providers, data scientists, and governments try to explore opportu- nities in the large scale and high-dimensional datasets. Nevertheless, several major computational challenges arise and prevent practitioners from constructing effective algorithms or tools to analyze their datasets. Dimensionality reduction (DR) plays an essential role in supervised and unsupervised learning tasks when the datasets are high dimensional. One beneﬁt of reducing the data dimension before classiﬁcation or clustering is to save storage and reduce computational cost for the later steps, however, the DR technique itself can be costly. We study a recently proposed and promising DR technique, the Wasserstein discriminant analysis, and propose a different formulation that could achieve comparable or better results with less computational cost. We also analyze the problem from a different perspective that was originated from electronic structure calculations, which could be of interest to a broader audience in the machine learning community.
53	Winning the Lottery with Continuous Sparsification	Pedro Savarese, Hugo Silva, Michael Maire	https://papers.nips.cc/paper/2020/file/83004190b1793d7aa15f8d0d49a13eba-Paper.pdf	Training deep neural networks usually requires signiﬁcant computational resources. Additional efforts are often needed to prune trained networks to enable efﬁcient inference – for example, in mobile applications which may be both power and compute constrained. Our work presents a new technique via which to sparsify networks, and contributes to the analysis of the recently discovered scientiﬁc phenomenon of re-trainable subnetworks (tickets). These contributions might open new pathways towards reducing the computational resources required for deep learning, thereby having a potentially wide-ranging practical impact across the ﬁeld.
54	Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification	Lynton Ardizzone, Radek Mackowiak, Carsten Rother, Ullrich Köthe	https://papers.nips.cc/paper/2020/file/593906af0d138e69f49d251d3e7cbed0-Paper.pdf	As our IB-INN is not bound to any particular application, and applies to settings that can in principle already be solved with existing methods, we foresee no societal advantages or dangers in terms of direct application. More generally, we think accurate uncertainty quantification plays an important role in a safe and productive use of AI.
55	Provable Online CP/PARAFAC Decomposition of a Structured Tensor via Dictionary Learning	Sirisha Rambhatla, Xingguo Li, Jarvis Haupt	https://papers.nips.cc/paper/2020/file/85b42dd8aae56e01379be5736db5b496-Paper.pdf	This work explores the theoretical foundations behind the success of popular alternating minimization-based techniques for tensor factorization. Speciﬁcally, we propose an algo- rithm for accurate model recovery for a tensor factorization task which has applications in clustering and pattern recovery. Since clustering-based algorithms are used for identiﬁcation of users for targeted advertising campaigns on social network platforms, potential use cases may target users based on their activity patterns. Nevertheless, understanding the theoreti- cal aspects of machine learning algorithms is crucial for ensuring safety and trustworthiness in critical applications, and can in fact be used to mitigate eﬀects of the very biases that these algorithms are prone to exacerbate.
56	Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees	Shali Jiang, Daniel Jiang, Maximilian Balandat, Brian Karrer, Jacob Gardner, Roman Garnett	https://papers.nips.cc/paper/2020/file/d1d5923fc822531bbfd9d87d4760914b-Paper.pdf	The central concern of this investigation is Bayesian optimization of an expensive-to-evaluate ob- jective function. As is standard in this body of literature, our proposed algorithms make minimal assumptions about the objective, effectively treating it as a “black box.” This abstraction is mathemat- ically convenient but ignores ethical issues related to the chosen objective. Traditionally, Bayesian optimization has been used for a variety of applications, including materials design and drug discovery [7], and could have future applications to algorithmic fairness. We anticipate that our methods will be utilized in these reasonable applications, but there is nothing inherent to this work, and Bayesian optimization as a ﬁeld more broadly, that preclude the possibility of optimizing a nefarious or at least ethically complicated objective.
57	GANSpace: Discovering Interpretable GAN Controls	Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris	https://papers.nips.cc/paper/2020/file/6fe43269967adbb64ec6149852b5cc3e-Paper.pdf	As our method is an image synthesis tool, it shares with other image synthesis tools the same potential beneﬁts (e.g., [2]) and dangers that have been discussed extensively elsewhere, e.g., see [18] for one such discussion. Our method does not perform any training on images; it takes an existing GAN as input. As discussed in Section 3.2, our method inherits the biases of the input GAN, e.g., limited ability to place makeup on male-presenting faces. Conversely, this method provides a tool for discovering biases that would otherwise be hard to identify.
58	Certifying Strategyproof Auction Networks	Michael Curry, Ping-yeh Chiang, Tom Goldstein, John Dickerson	https://papers.nips.cc/paper/2020/file/3465ab6e0c21086020e382f09a482ced-Paper.pdf	The immediate social impact of this work will likely be limited. Learned auction mechanisms are of interest to people who care about auction theory, and may eventually be used as part of the design of auctions that will be deployed in practice, but this has not yet happened to our knowledge. We note, however, that the design of strategyproof mechanisms is often desirable from a social good standpoint. Making the right move under a non-strategyproof mechanism may be difﬁcult for real-world participants who are not theoretical agents with unbounded computational resources. The mechanism may impose a real burden on them: the cost of ﬁguring out the correct move. By contrast, a strategyproof mechanism simply requires truthful reports—no burden at all. Moreover, the knowledge and ability to behave strategically may not be evenly distributed, with the result that under non-strategyproof mechanisms, the most sophisticated participants may game the system to their own beneﬁt. This has happened in practice: in Boston, some parents were able to game the school choice assignment system by misreporting their preferences, while others were observed not to do this; on grounds of fairness, the system was replaced with a redesigned strategyproof mechanism Abdulkadiroglu et al. [2006]. Thus, we believe that in general, the overall project of strategyproof mechanism design is likely to have a positive social impact, both in terms of making economic mechanisms easier to participate in and ensuring fair treatment of participants with different resources, and we hope we can make a small contribution to it.
59	Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond	Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, Cho-Jui Hsieh	https://papers.nips.cc/paper/2020/file/0cbc5671ae26f67871cb914d81ef8fc1-Paper.pdf	In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures. Our framework can be used in a wide variety of tasks ranging from robust- ness veriﬁcation to certiﬁed defense, and potentially many more applications requiring a provable perturbation analysis. It can also play an important building block for several safety-critical ML applications, such as transportation, engineering, and healthcare, etc. We expect that our framework will signiﬁcantly improve the robustness and reliability of real-world ML systems with theoretical guarantees. An important product of this paper is an open-source LiRPA library with over 10,000 lines of code, which provides automatic and differentiable perturbation analysis. This library can tremendously facilitate the use of LiRPA for the research community as well as industrial applications, such as veriﬁable plant control [50]. Our library of LiRPA on general computational graphs can also inspire further improved implementations on automatic outer bounds calculations with provable guarantees. Although our focus on this paper has been on exploring known perturbations and providing guarantees in such clairvoyant scenarios, in real-world an adversary (or nature) may not adhere to our assumptions. Thus, we may additionally want to understand implication of these unknown scenarios on the system performance. This is a relatively unexplored area in robust machine learning, and we encourage researchers to understand and mitigate the risks arising from unknown perturbations in these contexts.
60	Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs	Xiangru Huang, Haitao Yang, Etienne Vouga, Qixing Huang	https://papers.nips.cc/paper/2020/file/ca7be8306ecc3f5fa30ff2c41e64fa7b-Paper.pdf	Computing dense correspondences between a partial scan and a template, or between two partial scans, is a fundamental task for analyzing and understanding 3D data captured from the real world. Our work is foundational, improving the accuracy and robustness of this important task, and will beneﬁt downstream applications that rely on the ability to ﬁnd accurate dense correspondences. One such application area is human subject tracking, where the correspondences between the partial scan the the complete template model can be used to deform the template to obtain complete deformed shape that corresponds to each partial scan. Our research will allow reconstruction of higher-ﬁdelity animation sequences that better captures nuanced motion from large-scale, real-world data. Applications that will beneﬁt from this improved tracking include imitation learning, where a system can learn from motion of each observed subject, especially of ﬁne motor skills not able to be tracked before; movie/game industry, where one can insert the reconstructed motion of an actor into virtual environment, with unprecedented expressiveness of the reconstructed actor; and sports, where one can reconstruct and analyze the athletes’ motions to make recommendations both for improving athletic performance as well as enhancing athlete safety. Another application area is full body reconstruction from a few scans. In this setting, the template mesh serves as an intermediate object to establish dense correspondences between partial scans. Our research represents an important steps towards allowing ordinary users to scan themselves with high accuracy at home using commodity hardware. Access to a high-quality digital avatar facilitates many applications such as virtual ﬁtting for on-line shopping, improved telepresence and telemedicine, and new forms of entertainment and social media where users can place and animate themselves in a 3D environment. Potential abuses and negative impacts of improved tracking and reconstruction include the ability to identify people without their consent, based on body shape or motion characteristics, in settings where traditional facial recognition algorithms fail. 3D avatars of a person reconstructed from surreptitious partial scans might also be used to create “deep fakes” or to otherwise infringe on the privacy rights of the subject. From a technical perspective, the problem falls into the category of structure prediction that combines point-wise predictions and priors on correlations among multiple points. Unlike the standard MRF formulation, this paper explores a new data representation, which turns structure prediction into a continuous optimization problem. This methodology can inspire future research on relevant problems, where the problem space lies in a continuous domain. Moreover, there is growing interest in turning optimization problems into neural networks with hyper-parameters trained end-to-end. Our approach contributes to this effort, and we hope the insights we used to design the resulting neural network for training (including our analysis of robust recovery conditions for the transformation synchronization problem) can be applied to and stimulate future research on similar problems. Finally, like any algorithm for computing dense correspondences, our approach is not guaranteed to generate correct correspondences in all the settings. Additional checks and veriﬁcation (by humans using interactive tools, for instance) should be used to validate and rectify the outputs, especially if the results are used in safety- or health-critical applications such as personalized medicine.
61	Hierarchical Neural Architecture Search for Deep Stereo Matching	Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Hongdong Li, Tom Drummond, Zongyuan Ge	https://papers.nips.cc/paper/2020/file/fc146be0b230d7e0a92e66a6114b840d-Paper.pdf	The task of stereo matching has been studied for over half a century and numerous methods have been proposed. From traditional methods to deep learning based methods, people keep setting a new state-of-the-art through these years. Nowadays, deep learning based methods become more popular than traditional methods since deep methods are more accurate and faster. However, ﬁnding a better architecture for stereo matching networks remains a hot topic recently. Rather than designing a handcrafted architecture with trial and error, we propose to allow the network to learn a good architecture by itself in an end-to-end manner. Our method reduces more than 2/3 of searching time than previous method [25] and has much better performance, thus saves lots of energy consumption and good for our planet by reducing massive carbon footprints. In addition, our proposed search framework is relatively general and not limited to the speciﬁc task of stereo matching. It can be well extended to other dense matching tasks such as optical ﬂow estimation and multi-view stereo.
62	A graph similarity for deep learning	Seongmin Ok	https://papers.nips.cc/paper/2020/file/0004d0b59e19461ff126e3a08a814c33-Paper.pdf	This article mainly discusses two topics: how to measure similarity between graphs, and how to learn from graphs. One of the most important subjects in both ﬁelds is the molecular graph. A chemically meaningful similarity between molecules helps ﬁnd new drugs and invent new materials of great value. Many chemical search engines support similarity search based on ﬁngerprints, which indicate the existence of certain substructures. The ﬁngerprints have been useful to ﬁnd molecules of interest, but they are inherently limited to local properties. The proposed graph similarity is simple, fast and efﬁcient. The proposed graph neural network reports particular strength in molecular property prediction and molecular graph generation, albeit not studied extensively. It is possible that the proposed algorithms provide another, global perspective to molecular similarity. Another task for which the proposed neural network showed strength is the node classiﬁcation. The node classiﬁcation is mostly used to automatically categorize articles, devices, people, and other entities in interconnected networks at large scale. Some related examples include identifying false accounts in social network services, classifying a person for a recommendation system based on its friends’ interest, and detecting malicious edge-devices in Internet of Things or mobile networks. As with every machine learning applications, assessing and understanding the data is crucial in such cases. Especially in graph-structured data, we believe that the characteristic of data is the most important factor in deciding which graph learning algorithm to use. It is necessary to understand the principle and limitation of an algorithm to prevent failure. For example, our method has two caveats. First, it uses sum to collect information from the neighbors, and hence more suitable when the counts indeed matter and not just the distributions. Second, our method decides the similarity between two graphs using the local information. Hence when the "global" graph properties such as hamiltonicity, treewidth, and chromatic number are the deciding factor, our algorithm might not be the best choice. Graph learning in general are being applied to more and more tasks and applications. Some of the examples include recommendation systems, transportation analysis, and credit assignments. However, the study of risks regarding graph learning, such as adversarial attack, privacy protection, ethics and biases are still at an early stage. In practice, we should be warned about such risks and devise testing and monitoring framework from the start to avoid undesirable outcomes.
63	The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks	Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington	https://papers.nips.cc/paper/2020/file/c6dfc6b7c601ac2978357b7a81e2d7ae-Paper.pdf	This work is theoretical and does not present any foreseeable societal consequence.
64	Choice Bandits	Arpit Agarwal, Nicholas Johnson, Shivani Agarwal	https://papers.nips.cc/paper/2020/file/d5fcc35c94879a4afad61cacca56192c-Paper.pdf	The purpose of this paper is to understand whether efﬁcient learning is possible in a bandit setting where one does not receive quantitative feedback for an individual arm but rather relative feedback in the form of a multiway choice. It is well-known that quantitative judgments of humans can have biases; our algorithm, which learns from relative multiway choices, can help alleviate these biases. Moreover, by receiving larger choice sets from our algorithm, humans can have a better sense of the quality distribution of arms, and can make more informed choices. 5 We also considered the SelfSparring algorithm of [26] and the battling bandit algorithms of [27], which are applicable to choice models deﬁned in terms of an underlying pairwise comparison model P . However, these algorithms all return multisets St, and any simple reduction of such multisets to strict sets as considered in our setting (as well as the setting of [23]) can end up throwing away important information learned by the algorithms, resulting in a comparison that could be unfair to those algorithms. We did explore such reductions and our algorithm easily outperformed them, but we chose not to include the results here due to this issue of fairness. (Moreover, under the MNL model, [23] already established that MaxMinUCB outperforms those algorithms – presumably under similar reductions – so in the end, we decided such a comparison would provide little additional value here.) Another advantage of our setting is that we do not rely on historic data as our data collection is online. Hence, one does not need to worry about past biases being reﬂected in the choice datasets. However, one has to be cautious about the use of our algorithm in applications where arms represent individuals/entities such as job applicants, property renters etc. In these applications, the choices of people can be biased against certain individuals/groups, thereby hurting the chances of these individuals/groups to be selected by our algorithm. Here, depending on the application, one might need to consider imposing some form of fairness constraints on the choice sets output by our algorithm in order to prevent any discrimination against such individuals/groups.
65	Deep Imitation Learning for Bimanual Robotic Manipulation	Fan Xie, Alexander Chowdhury, M. Clara De Paolis Kaluza, Linfeng Zhao, Lawson Wong, Rose Yu	https://papers.nips.cc/paper/2020/file/18a010d2a9813e91907ce88cd9143fdf-Paper.pdf	Robotics systems that utilize fully automated policies for different tasks have already been applied to many manufacturing, assembly lines, and warehouses processes. Our work demonstrates the potential to take this automation one step further. Our algorithm can automatically learn complex control policies from expert demonstrations, which could potentially allow robots to augment their existing control designs and further optimize their workﬂows. Implementing learned policies in safety-critical environments such as large-scale assembly lines can be risky as these algorithms do not have guaranteed precision. Improved theoretical understanding and interpretability of model policies could potentially mitigate these risks.
66	Model Fusion via Optimal Transport	Sidak Pal Singh, Martin Jaggi	https://papers.nips.cc/paper/2020/file/fb2697869f56484404c8ceee2985b01d-Paper.pdf	Model fusion is a fundamental building block in machine learning, as a way of direct knowledge transfer between trained neural networks. Beyond theoretical interest it can serve a wide range of concrete applications. For instance, collaborative learning schemes such as federated learning are of increasing importance for enabling privacy-preserving training of ML models, as well as a better alignment of each individual’s data ownership with the resulting utility from jointly trained machine learning models, especially in applications where data is user-provided and privacy sensitive [29]. Here fusion of several models is a key building block to allow several agents to participate in joint training and knowledge exchange. We propose that a reliable fusion technique can serve as a step towards more broadly enabling privacy-preserving and efﬁcient collaborative learning.
67	Minimax Bounds for Generalized Linear Models	Kuan-Yun Lee, Thomas Courtade	https://papers.nips.cc/paper/2020/file/6a508a60aa3bf9510ea6acb021c94b48-Paper.pdf	The generalized linear model (GLM) is a broad class of statistical models that have extensive applications in machine learning, electrical engineering, ﬁnance, biology, and many areas not stated here. Many algorithms have been proposed for inference, prediction and classiﬁcation tasks under the umbrella of the GLM, such as the Lasso algorithm, the EM algorithm, Dantzig selectors, etc., but often it is hard to conﬁdently assess optimality. Lower bounds for minimax and Bayes risks play a key role here by providing theoretical benchmarks with which one can evaluate the performance of algorithms. While many previous approaches have focused on the Gaussian linear model, in this paper we establish minimax and Bayes risk lower bounds that hold uniformly over all statistical models within the GLM. Our arguments demonstrate a set of information-theoretic techniques that are general and applicable to setups other than the GLM. As a result, many applications stand to potentially beneﬁt from our work.
68	Confidence sequences for sampling without replacement	Ian Waudby-Smith, Aaditya Ramdas	https://papers.nips.cc/paper/2020/file/e96c7de8f6390b1e6c71556e4e0a4959-Paper.pdf	The main type of broader impact caused by our work is the reduction of time, money and energy due to the ability to continuously monitor data and hence make critical decisions early. In Appendix A, we provide four prototypical examples of situations where our methods may prove useful. In Exam- ple A, every single phone call requires time to collect the opinions, thus using up money as well, and if we can accurately quantify uncertainty then we can stop collecting data sooner. In Example B, randomization tests such as those involving permutations are a common way to quantify statistical signiﬁcance, but they are computationally intensive and thus take up a lot of time. Knowing when to stop, based on the test being clearly statistically signiﬁcant (or clearly far from it), can save on energy costs. In Example D, when an educational intervention is unrolled one school at a time, there are two possibilities again: if it is clearly beneﬁcial, we would like to recognize it quickly so that every student can avail of the beneﬁts, while if it is for some reason harmful (e.g. causing stress without measurable beneﬁt), then it would be equally important to end the program quickly. Once more, accurately quantifying uncertainty as the process unfolds underpins the ability to make these decisions early to disseminate beneﬁts rapidly or mitigate harms quickly. As a side remark, though we have not demonstrated it in this paper, our techniques are also applicable to auditing elections (checking whether the results are as announced by a manual random recount). ‘Risk-limiting audits’ [22] constitute a full-ﬂedged application area that we intend to pursue in future work; there are many variants depending on how voters express their preferences (choose one, or rank all, or score all) and the aggregation mechanism used to decide on one or multiple winners. Audits are not currently required by law in many state/county (or federal) elections due to high perceived effort among other reasons, so being able to stop these audits early, yet accurately and conﬁdently, is critical to their broad adoption. In this sense, a longer-term broader impact to trust in elections is anticipated.
69	Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality	Nian Si, Jose Blanchet, Soumyadip Ghosh, Mark Squillante	https://papers.nips.cc/paper/2020/file/f3507289cfdc8c9ae93f4098111a13f9-Paper.pdf	This is a theoretical contribution that, nevertheless, has the potential of impacting a wide range of application domains in business, engineering and science. In particular, all of those in which the Wasserstein distance has been extensively used as a statistical inference tool (e.g. image analysis and computer vision, signal processing, operations research, and so on). Because our paper provides a step towards breaking the curse of dimensionality in statistical rates of convergence, we believe that we have the potential of enabling more applications to multiple hypothesis testing (e.g., certifying Wasserstein GANs). In turn, we plan to improve human resource development by including some of the main ﬁndings in this paper in Ph.D. courses.
70	The Generalized Lasso with Nonlinear Observations and Generative Priors	Zhaoqiang Liu, Jonathan Scarlett	https://papers.nips.cc/paper/2020/file/dd45045f8c68db9f54e70c67048d32e8-Paper.pdf	Who may beneﬁt from this research. This is a theory paper primarily targeted at the research community. The signal recovery techniques studied could potentially be useful for practitioners in areas such as image processing, audio processing, and medical imaging. Who may be put at disadvantage from this research. We are not aware of any signiﬁcant/imminent risks of placing anyone at a disadvantage. Consequences of failure of the system. We believe that most failures should be immediately evident and detectable due to visibly poor reconstruction performance, and any such outputs could be discarded as needed. However, some more subtle issues could arise, such as the reconstruction missing important details in the signal due to the generative model not capturing them. As a result, care is advised in the choice of generative model, particularly in applications for which the reconstruction of ﬁne details is crucial. Potential biases. The signal recovery algorithm that we consider takes as input an arbitrary pre- trained generative model. If such a pre-trained model has inherent biases, they could be transferred to the signal recovery algorithm.
71	AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning	Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing	https://papers.nips.cc/paper/2020/file/0a2298a72858d90d5c4b4fee954b6896-Paper.pdf	The proposed AutoSync alleviates the burden on ML researchers and practitioners in choosing appropriate synchronization strategy for efﬁcient distributed training, enables substantial speed up of ML prototyping and training, and reduces the cost of their operational workloads using distributed computing. Further, AutoSync is transferable to unseen model and cluster settings by the design of domain-agnostic features. By this, ﬁnding a good synchronization strategy for a large-scale ML model such as BERT [7] and GPT [28] or on a relatively expensive cluster only requires developing runtime simulators using data collected from a streamlined model on handy clusters, saving substantial experimental efforts and budgets. We will release and open-source our code and a new dataset to beneﬁt the research community, to democratize high-performance ML systems, and make them accessible to non-ML-educated software developers and society at large. Since such needs are prevalent across many disciplines beyond computing and information science – such as industrial and manufacturing, healthcare, biology, social science, and ﬁnance – our deliverables are expected to have a catalytic impact.
72	Adaptive Shrinkage Estimation for Streaming Graphs	Nesreen Ahmed, Nick Duffield	https://papers.nips.cc/paper/2020/file/780261c4b9a55cd803080619d0cc3e11-Paper.pdf	There is a burgeoning recent literature of statistical estimation and adaptive data analysis of the higher-order structural properties of graphs in both the streaming and non streaming context that reﬂect the importance and interest of this topic for the graph algorithms and relational learning research community. On the other hand, shrinkage estimators are an established technique from more general statistics. This paper is the ﬁrst to apply shrinkage based methods in the context of graph approximation. The expected broader impact is as a proof of concept that shows the way for other researchers in this area to improve estimation quality. Moreover, this work ﬁts under statistical inference for temporal relational/network data, which would enable statistical analysis and learning for network data that appear in streaming settings, in particular when exact solutions are not feasible (similar to the important literature on randomization algorithms for data matrices [1]). Furthermore, there are many applications where the data has a pronounced temporal, relational, and spatial structure (e.g., relational data). Examples of Non-IID streams include (i) non-independence due to temporal clustering in communication graphs on internet, online social networks, physical contact networks, and social media such as ﬂash crowds and coordinated botnet activity; (ii) non- identical distributions in activity on these networks due to diurnal and other seasonal variations, synchronization of user network activity e.g., searches stimulated by hourly news reports. The proposed framework is suitable for these applications, because it makes no statistical assumptions concerning the arrival stream and the order of the arriving edges.
73	The Strong Screening Rule for SLOPE	Johan Larsson, Malgorzata Bogdan, Jonas Wallin	https://papers.nips.cc/paper/2020/file/a7d8ae4569120b5bec12e7b6e9648b86-Paper.pdf	The predictor screening rules introduced in this article allow for a substantial improvement of the speed of SLOPE. This facilitates application of SLOPE to the identiﬁcation of important predictors in huge data bases, such as collections of whole genome genotypes in Genome Wide Association Studies. It also paves the way for the implementation of cross-validation techniques and improved efﬁciency of the Adaptive Bayesian version SLOPE (ABSLOPE [39]), which requires multiple iterations of the SLOPE algorithm. Adaptive SLOPE bridges Bayesian and the frequentist methodology and enables good predictive models with FDR control in the presence of many hyper-parameters or missing data. Thus it addresses the problem of false discoveries and lack of replicability in a variety of important problems, including medical and genetic studies. In general, the improved efﬁciency resulting from the predictor screening rules will make the SLOPE family of models (SLOPE [3], grpSLOPE [6], and ABSLOPE) accessible to a broader audience, enabling researchers and other parties to ﬁt SLOPE models with improved efﬁciency. The time required to apply these models will be reduced and, in some cases, data sets that were otherwise too large to be analyzed without access to dedicated high-performance computing clusters can be tackled even with modest computational means. We can think of no way by which these screening rules may put anyone at disadvantage. The methods we outline here do not in any way affect the model itself (other than boosting its performance) and can therefore only be of beneﬁt. For the same reason, we do not believe that the strong rules for SLOPE introduces any ethical issues, biases, or negative societal consequences. In contrast, it is in fact possible that the reverse is true given that SLOPE serves as an alternative to, for instance, the lasso, and has superior model selection properties [10, 39] and lower bias [39].
74	Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach	Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Mladen Kolar, Zhaoran Wang	https://papers.nips.cc/paper/2020/file/65a99bb7a3115fdede20da98b08a370f-Paper.pdf	In recent years, the impact of machine learning (ML) on economics is already well underway [5, 15], and our work serves as a complement to this line of research. On the one hand, machine learning methods such as random forest, support vector machines and neural networks provide great ﬂexibility in modeling, while traditional tools in structural estimation that are well versed in the econometrics community are still primitive, despite recent advances [32, 26, 7, 21]. On the other hand, to facilitate ML-base decision making, one must be aware of the distinction between prediction and causal inference. Our method provides an NN-based solution to estimation of generalized SEMs, which encompass a wide range of econometric and causal inference models. However, we remark that in order to apply the method to policy and decision problems, one must pay equal attention to other aspects of the model, such as interpretability, robustness of the estimates, fairness and nondiscrimination, assumptions required for model identiﬁcation, and the testability of those assumptions. Unthoughtful application of ML methods in an attempt to draw causal conclusions must be avoided for both ML researchers and economists.
75	Meta-Neighborhoods	Siyuan Shan, Yang Li, Junier B. Oliva	https://papers.nips.cc/paper/2020/file/35464c848f410e55a13bb9d78e7fddd0-Paper.pdf	Any general discriminative machine learning model runs the risk of making biased and offensive predictions reﬂective of training data. Our work is no exception as it aims at improving discriminative learning performance. To reduce these negative inﬂuences to the minimum possible extent, we only use standard benchmarks in this work, such as CIFAR-10, Tiny-ImageNet, MNIST, and datasets from the UCI machine learning repository. Our work does impose some privacy concerns as we are learning a per-instance adjusted model in this work. Potential applications of the proposed model include precision medicine, personalized recommendation systems, and personalized driver assistance systems. To keep user data safe, it is desirable to only deploy our model locally. The induced neighbors in our work, which are semantically meaningful, can also be regarded as fake synthetic data. Like DeepFakes, they may also raise a set of challenging policy, technology, and legal issues. Legislation regarding synthetic data should take effect and the research community needs to develop effective methods to detect these synthetic data.
76	When Counterpoint Meets Chinese Folk Melodies	Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang	https://papers.nips.cc/paper/2020/file/bae876e53dab654a3d9d9768b1b7b91a-Paper.pdf	The idea of integrating Western counterpoint into Chinese folk music generation is innovative. It would make positive broader impacts on three aspects: 1) It would facilitate more opportunities and challenges of music cultural exchanges at a much larger scale through automatic generation. For example, the inter-cultural style fused music could be used in Children’s enlightenment education to stimulate their interest in both cultures. 2) It would further the idea of collaborative counterpoint improvisation between two parts (e.g., a human and a machine) to music traditions where such interaction was less common. 3) The computer-generated music may “reshape the musical idiom”[23], which may bring more opportunities and possibilities to produce creative music. The proposed work may also have some potential negative societal impacts: 1) Similar to other computational creativity research, the generated music has the possibility of plagiarism by copying short snippets from the training corpus, even though copyright infringement is not a concern as neither folk melodies nor Bach’s music has copyright. That being said, our online music generation approach conditions music generation on past human and machine generation, and is less likely to directly copy snippets than ofﬂine approaches do. 2) The proposed innovative music generation approach may cause disruptions to current music professions, even deprive them of their means of existence[23]. However, it also opens new areas and creates new needs in this we-media era. Overall, we believe that the positive impacts signiﬁcantly outweigh the negative impacts.
77	Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm	lin yang, Mohammad Hajiesmaili, Mohammad Sadegh Talebi, John C. S. Lui, Wing Shing Wong	https://papers.nips.cc/paper/2020/file/e655c7716a4b3ea67f48c6322fc42ed6-Paper.pdf	Our work ﬁts within the broad direction of research concerning safety issues in AI/ML at large. With the recent radical advances in machine learning, ML-assisted decision making is fast becoming an intrinsic part of the design of systems and services that billions of people around the world use every day. And not surprisingly, investigating the vulnerability of existing learning models and robustness against manipulation attacks are becoming critically important in the light of trustworthy learning paradigm. Hence, there has been a surge of interest in making learning models that are robust against adversarial attacks for both applied ML such as supervised learning and deep learning, and theoretical ML such as reinforcement learning and multi-armed bandits. This is critically important for society, since the ML algorithms are being adopted more and more in safety-critical domains across sciences, businesses, and governments that impact people’s daily lives. Last, we see no ethical concerns related to this paper.
78	The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise	Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi	https://papers.nips.cc/paper/2020/file/ebd64e2bf193fc8c658af2b91952ce8d-Paper.pdf	Our work aims to advance the algorithmic foundations of adversarially robust machine learning. This subﬁeld focuses on protecting machine learning models (especially their predictions) against small perturbations of the input data. This broad goal is a pressing challenge in many real-world scenarios, where successful adversarial example attacks can have far-reaching implications given the adoption of machine learning in a wide variety of applications, from self-driving cars to banking. Since the primary focus of our work is theoretical and addresses a simple concept class, we do not expect our results to have immediate societal impact. Nonetheless, we believe that our ﬁndings provide interesting insights on the algorithmic possibilities and fundamental computational limitations of adversarially robust learning. We hope that, in the future, these insights could be useful in the design of practically relevant adversarially robust classiﬁers in the presence of noisy data.
79	Swapping Autoencoder for Deep Image Manipulation	Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei Efros, Richard Zhang	https://papers.nips.cc/paper/2020/file/50905d7b2216bfeccb5b41016357176b-Paper.pdf	From the sculptor’s chisel to the painter’s brush, tools for creative expression are an important part of human culture. The advent of digital photography and professional editing tools, such as Adobe Photoshop, has allowed artists to push creative boundaries. However, the existing tools are typically too complicated to be useful by the general public. Our work is one of the new generation of visual content creation methods that aim to democratize the creative process. The goal is to provide intuitive controls (see Section 4.6) for making a wider range of realistic visual effects available to non-experts. While the goal of this work is to support artistic and creative applications, the potential misuse of such technology for purposes of deception – posing generated images as real photographs – is quite concern- ing. To partially mitigate this concern, we can use the advances in the ﬁeld of image forensics [16], as a way of verifying the authenticity of a given image. In particular, Wang et al. [72] recently showed that a classiﬁer trained to classify between real photographs and synthetic images generated by ProGAN [42], was able to detect fakes produced by other generators, among them, StyleGAN [43] and Style- GAN2 [44]. We take a pretrained model of [72] and report the detection rates on several datasets in Ap- pendix ??. Our swap-generated images can be detected with an average rate greater than 90%, and this in- dicates that our method shares enough architectural components with previous methods to be detectable. However, these detection methods do not work at 100%, and performance can degrade as the images are degraded in the wild (e.g., compressed, rescanned) or via adversarial attacks. Therefore, the problem of verifying image provenance remains a signiﬁcant challenge to society that requires multiple layers of solutions, from technical (such as learning-based detection systems or authenticity certiﬁcation chains), to social, such as efforts to increase public awareness of the problem, to regulatory and legislative.
80	Group Contextual Encoding for 3D Point Clouds	Xu Liu, Chengtao Li, Jian Wang, Jingbo Wang, Boxin Shi, Xiaodong He	https://papers.nips.cc/paper/2020/file/9b72e31dac81715466cd580a448cf823-Paper.pdf	Our “Group Contextual Encoding” can be directly applied to the 3D point cloud scene understanding tasks including 3D object detection, voxel labeling, and segmentation. Our research can also support downstream research and applications such as autonomous driving, robotics, and AR/MR. We will investigate the generalizability of our method to other tasks and frameworks, e.g., Graph Convolution network, 3D sparse CNNs, where the global context plays a crucial role in these tasks. On the other hand, this technology may also endanger the employment of human servants and drivers because they may be replaced by autonomous robots and vehicles, which may cause the potential social problems. This issue should be taken seriously and measures should be taken for preparation.
81	A Simple Language Model for Task-Oriented Dialogue	Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher	https://papers.nips.cc/paper/2020/file/e946209592563be0f01c844ab2170f0c-Paper.pdf	This work may have implications for the simpliﬁcation of conversational agents. In the narrow sense, this work addresses task-oriented dialogue, but similar results might also hold for open-domain conversational systems. If so, the improvement of these systems and easier deployment would amplify both the positive and negative aspects of conversational AI. Positively, conversational agents might play a role in automating predictable communications, thereby increasing efﬁciency in areas of society that currently lose time navigating the multitude of APIs, webpages, and telephonic systems that are used to achieve goals. Negatively, putting conversational agents at the forefront might dehumanize communication that can be automated and might lead to frustration where human agents could provide more efﬁcient solutions – for example, when predicted solutions do not apply. These consequences are not speciﬁc to this work, but should be considered by the ﬁeld of conversational AI more broadly.
82	Feature Importance Ranking for Deep Learning	Maksymilian Wojtas, Ke Chen	https://papers.nips.cc/paper/2020/file/36ac8e558ac7690b6f44e2cb5ef93322-Paper.pdf	This research does not involve any issues directly regarding ethical aspects and future societal consequences. In the future, our approach presented in this paper might be applied in different domains, e.g., medicine and life science, where ethical aspects and societal consequences might have to be considered.
83	Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction	Tong He, John Collomosse, Hailin Jin, Stefano Soatto	https://papers.nips.cc/paper/2020/file/690f44c8c2b7ded579d01abe8fdb6110-Paper.pdf	Who may beneﬁt from this research? The VR / AR software developers and 3D graphics designers may beneﬁt from our research. The proposed technique generates single-view clothed human mesh reconstructions with improved global topology regularities and local surface details. Our method can beneﬁt various VR / AR applications that involve reconstructing 3D virtual human avatars for customized user experience, such as conference systems and role-playing games. Moreover, being able to efﬁciently reconstruct 3D meshes from single-view images is useful for graphics rendering and 3D designs. Who may be put at disadvantage from this research? In the long run, some entry-level graphics artists and designers might be affected. Generally speaking, the 3D gaming and graphics design industries are moving towards automatic content generation techniques. These techniques are not meant to replace highly skilled human workers, but to help improve their productivity at work. What are the consequences of failure of the system? Failed human mesh reconstructions might bring unpleasant user experience. Typical failure cases as well as possible solutions have also been discussed in the main paper. Whether the task/method leverages biases in the data? There might be some biases on human poses and clothes due to long-tail cases. However, our dataset is already 10× larger than the one used in the competing methods. More importantly, our mesh collection procedures can be easily expanded to other domain-speciﬁc scenarios to obtain more human meshes with different shapes, poses and clothes to compensate for long-tail cases.
84	The Origins and Prevalence of Texture Bias in Convolutional Neural Networks	Katherine Hermann, Ting Chen, Simon Kornblith	https://papers.nips.cc/paper/2020/file/db5f9f42a7157abe65bb145000b5871a-Paper.pdf	People who build and interact with tools for computer vision, especially those without extensive training in machine learning, often have a mental model of computer vision models as similar to human vision. Our ﬁndings contribute to a body of work showing that this view is actually far from correct, especially for ImageNet, one of the datasets most commonly used to train and evaluate models. Divergences between human and machine vision of the kind we study could cause users to make signiﬁcant errors in anticipating and reasoning about the behavior of computer vision systems. Our ﬁndings contribute to a body of work delineating divergences between human and machine vision, and suggesting avenues for bringing the two systems closer together. Allowing people from a wide range of backgrounds to make safe, predictable, and equitable models requires vision systems to perform at least roughly in accordance with their expectations. Making computer vision models that share the same inductive biases as humans is an important step towards this goal. At the same time, we recognize the possible negative consequences of blindly constraining models’ judgments to agree with people’s: human visual judgments display forms of bias that should be kept out of computer models. More broadly, we believe that work like ours can have a beneﬁcial impact on the internal sociology of the machine learning community. By identifying connections to developmental psychology and neuroscience, we hope to enhance interdisciplinary connections across ﬁelds, and to encourage people with a broader range of training and backgrounds to participate in machine learning research.
85	Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information	Genevieve Flaspohler, Nicholas A. Roy, John W. Fisher III	https://papers.nips.cc/paper/2020/file/7f2be1b45d278ac18804b79207a24c53-Paper.pdf	Decision-making problems are ubiquitous, arising in applications such as tracking an oil spill using a marine robot, selecting an effective drug schedule in personalized medicine, or allocating irrigation resources based on seasonal weather forecasts. In each of these important application areas, system dynamics are represented by complex and potentially learned models and the decision-making agent can only observe the state through limited sensors. Many current planning and reinforcement learning algorithms focus on fully-observable domains and generate learned policies without performance guarantees. However, uncertainty and formal guarantees must play a role in robust decision-making for high-stakes domains. VoI macro-action generation contributes to fundamental research in robust and efﬁcient model-based planning under uncertainty. As with all formal results, however, the bounds we derive only hold under the assumptions that we describe in the text. When performing decision-making in high-stakes applications, understanding these conditions, the extent to which they hold, and how algorithm performance degrades as assumptions are violated is critical.
86	Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights	Theofanis Karaletsos, Thang D. Bui	https://papers.nips.cc/paper/2020/file/c70341de2c112a6b3496aec1f631dddd-Paper.pdf	Our work targets studying priors of neural networks with respect to two speciﬁc aspects: ﬁrst, we aim at obtaining weights which are sharp close to the training data and uncertain away from training data, in order to calibrate the model’s conﬁdence. This is essential for many applications where predictions of neural networks are consumed to drive decisions, which may occur a cost. In case our model produces "I don’t know" predictions as we showed it is capable of in OOD data, ML-systems can either probe an expert or utilize a fallback plan for decisions. Such cases occur across industrial applications of algorithmic decision making and impact economics and fairness, but are even more critical in ﬁelds such as healthcare or autonomy where wrong but overconﬁdent predictions may lead to catastrophic decisions. The second area of impact centers around the ability of a practitioner to express speciﬁc types of prior knowledge for the functions learned by a neural network via auxiliary kernels. This can help practitioners utilize neural networks as less of a black box and ultimately may lead to the ability to train networks with rich weight-based function spaces with little data. These types of network regularization are application-dependent, but ultimately we hope structures such as the ones we propose may be able to aid with generalization outside the training data by encoding prior knowledge into networks, an ability that would potentially help in a variety of real world scenarios where data paucity exists but prior knowledge can be used to ﬁll the gaps.
87	Online Matrix Completion with Side Information	Mark Herbster, Stephen Pasteris, Lisa Tse	https://papers.nips.cc/paper/2020/file/eb06b9db06012a7a4179b8f3cb5384d3-Paper.pdf	In general this work does not present any foreseeable speciﬁc societal consequence in the authors’ joint opinion. This is foundational research in regret-bounded online learning. As such it is not targeted towards any particular application area. Although this research may have societal impact for good or for ill in the future, we cannot foresee the shape and the extent.
88	Certifiably Adversarially Robust Detection of Out-of-Distribution Data	Julian Bitterwolf, Alexander Meinke, Matthias Hein	https://papers.nips.cc/paper/2020/file/b90c46963248e6d7aab1e0f429743ca0-Paper.pdf	In order to use machine learning in safety-critical systems it is required that the machine learning system correctly ﬂags its uncertainty. As neural networks have been shown to be overconﬁdent far away from the training data, this work aims at overcoming this issue by not only enforcing low conﬁdence on out-distribution images but even guaranteeing low conﬁdence in a neighborhood around it. As a neural network should not ﬂag that it knows when it does not know, this paper contributes to a safer use of deep learning classiﬁers.
89	Neural Networks Fail to Learn Periodic Functions and How to Fix It	Liu Ziyin, Tilman Hartwig, Masahito Ueda	https://papers.nips.cc/paper/2020/file/1160453108d3e537255e9f7b931f4e90-Paper.pdf	In the field of deep learning, we hope that this work will attract more attention to the study of how neural networks extrapolate, since how a neural network extrapolates beyond the region it observes data determines how a network generalizes. In terms of applications, this work may have broad practical importance because many processes in nature and in society are periodic in nature. Being able to model periodic functions can have important impact to many fields, including but not limited to physics, economics, biology, and medicine.
90	BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images	Thu H. Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, Niloy Mitra	https://papers.nips.cc/paper/2020/file/4b29fa4efe4fb7bc667c7b301b74d52d-Paper.pdf	BlockGAN is an image generative model that learns an object-oriented 3D scene representation directly from unlabelled 2D images. Our approach is a new machine learning technique that makes it possible to generate unseen images from a noise vector, with unprecedented control over the identity and pose of multiple independent objects as well as the background. In the long term, our approach could enable powerful tools for digital artists that facilitate artistic control over realistic procedurally generated digital content. However, any tool can in principle be abused, for example by adding new, manipulating or removing existing objects or people from images. At training time, our network performs a task somewhat akin to scene understanding, as our approach learns to disentangle between multiple objects and individual object properties (speciﬁcally their pose and identity). At test time, our approach enables sampling new images with control over pose and identity for each object in the scene, but does not directly take any image input. However, it is possible to embed images into the latent space of generative models [1]. A highly realistic generative image model and a good image ﬁt would then make it possible to approximate the input image and, more importantly, to edit the individual objects in a pictured scene. Similar to existing image editing software, this enables the creation of image manipulations that could be used for ill-intended misinformation (fake news), but also for a wide range of creative and other positive applications. We expect the beneﬁts of positive applications to clearly outweigh the potential downsides of malicious applications.
91	High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization	Qing Feng , Ben Letham, Hongzi Mao, Eytan Bakshy	https://papers.nips.cc/paper/2020/file/faff959d885ec0ecf70741a846c34d1d-Paper.pdf	The methods introduced in this paper expand the scope of problems to which contextual Bayesian optimization can be applied, and are especially important for settings where policies are evaluated with A/B tests. We expect this work to be directly beneﬁcial in this setting, for instance for improving services at Internet companies as in the ABR example that we described in the paper. We are including our complete code for all of the models introduced in this paper, so the work will be immediately useful. As shown in the paper, contextualization improves not only the top-line performance of policies, but also improves the fairness of policies by improving outcomes speciﬁcally for small populations that do not achieve good performance under an existing non-contextual policy. This work will directly beneﬁt these currently under-served populations.
92	RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference	Oindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma, Prateek Jain	https://papers.nips.cc/paper/2020/file/ebd9629fc3ae5e9f6611e2ee05a31cef-Paper.pdf	Pros: ML models are compute-intensive and are typically served on power-intensive cloud hardware with a large resource footprint that adds to the global energy footprint. Our models can help reduce this footprint by (a) allowing low power edge sensors with small memory to analyze images and admit only interesting images for cloud inference, and (b) reducing the inference complexity of the cloud models themselves. Further, edge-ﬁrst inference enabled by our work can reduce reliance on networks and also help provide privacy guarantees to end-user. Furthermore, vision models on tiny edge devices enables accessible technologies, e.g., Seeing AI [33] for people with visual impairment. Cons: While our intentions are to enable socially valuable use cases, this technology can enable cheap, low-latency and low-power tracking systems that could enable intrusive surveillance by malicious actors. Similarly, abuse of technology in certain wearables is also possible. Again, we emphasize that it depends on the user to see the adaptation to either of these scenarios.
93	Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions	Yi Hao, Alon Orlitsky	https://papers.nips.cc/paper/2020/file/4dbf29d90d5780cab50897fb955e4373-Paper.pdf	Classical information theory states that an i.i.d. sample contains H(X n ∼ p) = nH(p) information, which provides little insight for statistical applications. We present a different view by decomposing the sample information into three parts: the labeling of the proﬁle elements, ordering of them, and proﬁle entropy. With no bias towards any symbols, the proﬁle entropy rises as a fundamental measure unifying the concepts of estimation, inference, and compression. We believe this view could help researchers in information theory, statistical learning theory, and computer science communities better understand the information composition of i.i.d. samples over discrete domains. The results established in this work are general and fundamental, and have numerous applications in privacy, economics, data storage, supervised learning, etc. A potential downside is that the theoretical guarantees of the associated algorithms rely on the assumption correctness, e.g., the domain should be discrete and the sampling process should be i.i.d. . In other words, it will be better if users can conﬁrm these assumptions by prior knowledge, experiences, or statistical testing procedures. Taking a different perspective, we think a potential research direction following this work is to extend these results to Markovian models, making them more robust to model misspeciﬁcation.
94	Directional Pruning of Deep Neural Networks	Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng	https://papers.nips.cc/paper/2020/file/a09e75c5c86a7bf6582d2b4d75aad615-Paper.pdf	Our paper belongs to the cluster of works focusing on efﬁcient and resource-aware deep learning. There are numerous positive impacts of these works, including the reduction of memory footprint and computational time, so that deep neural networks can be deployed on devices equipped with less capable computing units, e.g. the microcontroller units. In addition, we help facilitate on-device deep learning, which could replace traditional cloud computation and foster the protection of privacy. Popularization of deep learning, which our research helps facilitate, may result in some negative societal consequences. For example, the unemployment may increase due to the increased automation enabled by the deep learning.
95	Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples	Shafi Goldwasser, Adam Tauman Kalai, Yael Kalai, Omar Montasser	https://papers.nips.cc/paper/2020/file/b6c8cf4c587f2ead0c08955ee6e2502b-Paper.pdf	In adversarial learning, this work can beneﬁt users when adversarial examples are correctly identiﬁed. It can harm users by misidentifying such examples, and the misidentiﬁcations of examples as suspicious could have negative consequences just like misclassiﬁcations. This work ideally could beneﬁt groups who are underrepresented in training data, by abstaining rather than performing harmful incorrect classiﬁcation. However, it could also harm such groups: (a) by providing system designers an alternative to collecting fully representative data if possible; (b) by harmfully abstaining at different rates for different groups; (c) when those labels would have otherwise been correct but are instead being withheld; and (d) by identifying them when they would prefer to remain anonymous. Our experiments on handwriting recognition have few ethical concerns but also have less ecological validity than real-world experiments on classifying explicit images or medical scans. A note of caution. Inequities may be caused by using training data that differs from the test distribution on which the classiﬁer is used. For instance, in classifying a person’s gender from a facial image, Buolamwini and Gebru [2018] have demonstrated that commercial classiﬁers are highly inaccurate on dark-skinned faces, likely because they were trained on light-skinned faces. In such cases, it is preferable to collect a more diverse training sample even if it comes at greater expense, or in some cases to abstain from using machine learning altogether. In such cases, 𝑃 𝑄 learning should not be used, as an unbalanced distribution of rejections can also be harmful.4
96	Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking	Anqi Wu, E. Kelly Buchanan, Matthew Whiteway, Michael Schartner, Guido Meijer, Jean-Paul Noel, Erica Rodriguez, Claire Everett, Amy Norovich, Evan Schaffer, Neeli Mishra, C. Daniel Salzman, Dora Angelaki, Andrés Bendesky, The International Brain Laboratory The International Brain Laboratory, John P. Cunningham, Liam Paninski	https://papers.nips.cc/paper/2020/file/4379cf00e1a95a97a33dac10ce454ca4-Paper.pdf	We propose a new method for animal behavioral tracking. As highlighted in the introduction and in [10], recent years have seen a rapid increase in the development of methods for animal pose estimation, which need to operate in a different regime than methods developed for human pose estimation. Our work signiﬁcantly improves the state of the art for animal pose estimation, and thus advances behavioral analysis for animal research, an essential task for scientiﬁc discovery in ﬁelds ranging from neuroscience to ecology. Finally, our work represents a compelling fusion of deep learning methods with probabilistic graphical model approaches to statistical inference, and we hope to see more fruitful interactions between these rich topic areas in the future.
97	Adapting to Misspecification in Contextual Bandits	Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert	https://papers.nips.cc/paper/2020/file/84c230a5b1bc3495046ef916957c7238-Paper.pdf	This paper concerns contextual bandit algorithms that adapt to unknown model misspeciﬁcation. Because of their efﬁciency and ability to adapt to the amount of misspeciﬁcation contained with no prior knowledge, our algorithms are robust, and may be suitable for large-scale practical deployment. On the other hand, our work is at the level of foundational research, and hence its impact on society is shaped by the applications that stem from it. We will focus our brief discussion on the applications mentioned in the introduction. Health services [43] offer an opportunity for potential positive impact. Contextual bandits can be used to propose medical interventions that lead to a better health outcomes. However, care must be taken to ethically implement the explore-exploit tradeoff in this sensitive setting, and more research is required. Online advertisements [4, 35] and recommendation systems [8] are another well-known application. While improved, robust algorithms can lead to increased proﬁts here, it is important to recognize that this may positively impact society as a whole. Lastly, we mention that predictive algorithms like contextual bandits become more and more powerful as more information is gathered about users. This provides a clear incentive toward collecting as much information as possible. We believe that the net beneﬁt of research on contextual bandit outweighs the harm, but we welcome regulatory efforts to produce a legal framework that steers the usage of machine learning algorithms, including in contextual bandits, in a direction which is respects of the privacy rights of users.
98	Autofocused oracles for model-based design	Clara Fannjiang, Jennifer Listgarten	https://papers.nips.cc/paper/2020/file/972cda1e62b72640cb7ac702714a115f-Paper.pdf	If adopted more broadly, our work could affect how novel proteins, small molecules, materials, and other entities are engineered. Because predictive models are imperfect, even with the advances pre- sented herein, care should be taken by practitioners to verify that any proposed design candidates are indeed safe and ethical for the intended downstream applications. The machine learning approach we present facilitates obtaining promising design candidates in a cost-effective manner, but practitioners must follow up on candidates proposed by our approach with conventional laboratory methods, as appropriate to the application domain.
99	Universal Domain Adaptation through Self Supervision	Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko	https://papers.nips.cc/paper/2020/file/bb7946e7d85c81a9e69fee1cea4a087c-Paper.pdf	Our work is applicable to training deep neural networks with less supervision via knowledge transfer from auxiliary datasets. Modern deep networks outperform humans on many datasets given a lot of annotated data, such as in ImageNet. Our proposed method can help reduce the burden of collecting large-scale supervised data in many applications where large related datasets are available. IThe positive impact of our work is to reduce the data gathering effort for data-expensive applications. This can make the technology more accessible for institutions and individuals that do not have rich resources.t can also help applications where data is protected by privacy laws and is therefore difﬁcult to gather, or in sim2real applications where simulated data is easy to create but real data is difﬁcult to collect. The negative impacts could be to make these systems more accessible to companies, governments or individuals that attempt to use them for criminal activities such as fraud. Furthermore, As with all current deep learning systems, ours is susceptible to adversarial attacks and lack of interpretability. Finally, while we show improved performance relative to state-of-the-art, negative transfer could still occur, therefore our approach should not be used in mission-critical applications or to make important decisions without human oversight.
100	Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking	Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari	https://papers.nips.cc/paper/2020/file/f1ea154c843f7cf3677db7ce922a2d17-Paper.pdf	Over the last couple of years we have witnessed an unprecedented increase in the amount of data collected and processed in order to tackle real life problems. Advances in numerous data-driven system such as the Internet of Things, health-care, multi-agent robotics wherein data are scattered across the agents (e.g., sensors, clouds, robots), and the sheer volume and spatial/temporal disparity of data render centralized processing and storage infeasible or inefﬁcient. Compared to the typical parameter-server type distributed system with a fusion center, decentralized optimization has its unique advantages in preserving data privacy, enhancing network robustness, and improving the computation efﬁciency. Furthermore, in many emerging applications such as collaborative ﬁltering, federated learning, distributed beamforming and dictionary learning, the data is naturally collected in a decentralized setting, and it is not possible to transfer the distributed data to a central location. Therefore, decentralized computation has sparked considerable interest in both academia and industry. At the same time convex formulations for training machine learning tasks have been replaced by nonconvex representations such as neural networks and a line of signiﬁcant non convex problems are on the spotlight. Our paper contributes to this line of work and broadens the set of problems that can be successfully solved without the presence of a central coordinating authority in the aforementioned framework. The implications on the privacy of the agents are apparent while rendering the presence of an authority unnecessary has political and economical extensions. Furthermore, numerous applications are going to beneﬁt from our result impacting society in many different ways.