Some AI Governance Research Ideas
June 3 2021
Compiled by Markus Anderljung & Alexis Carlier
Below are some research ideas from folks at the Centre for the Governance of AI (GovAI) collated by Markus Anderljung and Alexis Carlier. If you are interested in pusuing any of the ideas, feel free to reach out to contact@governance.ai. We may be able to help you find mentorship, advice, or collaborators. You can also reach out if you’re intending to work on the project independently, so that we can help avoid duplication of effort.
See this EA Forum post for additional context.
The ideas:
The Impact of US Nuclear Strategists in the early Cold War
Transformative AI and the Challenge of Inequality
Will there be a California Effect for AI?
History of existential risk concerns around nanotechnology
Broader impact statements: Learning lessons from their introduction and evolution
Structuring access to AI capabilities: lessons from synthetic biology
Lessons from Self-Governance Mechanisms in AI
How does government intervention and corporate self-governance relate?
Summary and analysis of “common memes” about AI, in different communities
A Review of Strategic-Trade Theory
Compute Providers as a Node of AI Governance
China’s access to cutting edge chips
Compute Provider Actor Analysis
Written by Waqar Zaidi
This project explores the impact of US nuclear strategists on nuclear strategy in the early Cold War. What types of experts provided advice on US nuclear strategy? How and in what ways did they affect state policy making on nuclear weapons from 1945 through to the end of the 1950s (and possibly beyond)? How could they have had a larger impact? This project provides a detailed case study to help us understand how and through what pathways technical experts have been able to shape state policymaking in relation to this critical technology. Knowing whether nuclear strategists had any impact should update us on the extent we might be able to have an impact on AI governance (or the development of other crucial technologies for that matter).
Sources
This project could be based almost exclusively on published sources (though archival research visits to the US could be helpful). There is a large historical and biographical literature that can be drawn upon. It is somewhat disparate and scattered, and would require significant connecting together. This process of interconnection is likely to be fruitful, providing new insights and perspectives.
The secondary literature may be thought of as consisting of three parts. First there is a longstanding literature on the historical development of nuclear strategy itself. Historically this body of work has not explored the impact of strategists in depth, though that may now be changing.[1] Second, there is a growing literature which is now exploring the impact and work of nuclear strategists, either as individuals (for example Thomas Schelling) or through organizations such as RAND.[2] Third, there are now a number of personal accounts of nuclear strategic development that provide insight into how nuclear strategy was made.[3]
Individuals of interest could include: Bernard Brodie, Herbert Goldhammer, Herman Kahn, William Kaufmann, Nathan Leites, Andrew Marshall, Henry S. Rowen, Thomas C. Schelling, Donald Brennan, Walter Millis, and Albert Wohlstetter. Thinking on nuclear strategy was carried out in a number of government and non-government organizations, including independent think tanks and university research centers. The key organization is RAND, though the project could also look into the Hudson Institute and the Foreign Policy Research Institute, amongst many others.
Written by Anton Korinek
From an economic perspective, among the greatest challenges that transformative AI may pose are increases in inequality - if advanced AI systems can perform work far more efficiently than humans, workers may no longer be able to earn a living wage, whereas entrepreneurs and the owners of corporations may see their incomes rise significantly.
Inequality has been rising and been a significant challenge for policymakers for decades. But increases in inequality are not an unavoidable by-product of technological progress. Instead, as long as humans are in control, whether progress leads to greater inequality or greater shared prosperity is our collective choice. (To provide a useful analogy, we have seen in recent years that a concerted effort can successfully re-orient our economy in a "green" direction. It is similarly possible to reorient our economy in a direction that leads to shared prosperity.) Ensuring that transformative AI leads to broadly shared increases in living standards is the most important economic dimension of the AI alignment problem.
This line of reasoning gives rise to three sets of research questions:
1) How do different types of advances in AI affect inequality? The first step for ethical AI developers to internalize their contributions to greater income inequality -- or equality -- is to measure the effects of what they are doing.
Research Question: Focusing on a specific AI application (e.g. AVs, a medical AI system, warehouse robots, customer service chatbots, etc.), what are the equilibrium effects on workers? Does a given AI application increase or reduce inequality? Does it improve or worsen worker conditions?
A model for determining these effects is outlined in Klinova and Korinek (2021), "AI and Shared Prosperity," Proceedings of the AIES '21. Useful resources are also provided by the Partnership on AI's Shared Prosperity Initiative.
2) Building on this analysis, how can ethically responsible AI developers ensure that their inventions contribute to shared prosperity if their inventions reduce demand for workers?
Research Question: Focusing on a specific AI application that threatens to significantly increase inequality, how could this application be reoriented? Are there ways of actively involving humans in some of the processes? Or ways of ensuring that the gains are distributed more equally (e.g. in the spirit of the windfall clause)? If not, should it be abandoned?
3) If AI developers ignore the discussed effects on inequality, what can policymakers do to address concerns about inequality and the challenges brought about by widespread displacement of labor in the future?
Research Question: Focusing on a specific country, what are the existing safety nets for workers? How much would workers lose if there is widespread job displacement and technological unemployment? How can safety nets be reformed so that labor displacement in the future does not automatically lead to economic misery? Also, can we put these reforms on auto-pilot such that benefits automatically increase e.g. when the economy grows, when the labor share declines, etc.
Written by Jeffrey Ding
How will AI affect the risk of military accidents? Can past cases of software failures in military systems shed light on this issue? This question's stakes are high. In the past, accidents in technological systems generated many "near-miss" nuclear crises during the Cold War (Sagan 1993). In the present, a naval accident is one of the most likely triggers for U.S.-China conflagration. In the future, analysis of AI-linked accidents could provide another lens into thinking about the risks associated with artificial general intelligence (AGI).
When we think of AI-related accidents, we often gravitate toward depictions of technical malfunctions (e.g. an autonomous vehicle crashes because it encounters an edge scenario that it hasn’t been trained on). This is akin to a software failure, typically defined as the inability of code “to perform its required function within specified performance requirements” (Foreman et al. 2015, 102). You could even argue that “reward hacking” is a sophisticated version of a software failure. The agent is not performing the intended function. However, this narrow definition of software failure overlooks many cases in which “software complied with its requirements yet directly contributed to or led to an accident” (Foreman et al. 2015, 102). Nothing was hacked. The code worked as intended, but an accident still occurred.
Take, as an example, the 1988 Vincennes incident, in which a U.S. naval ship accidentally shot down an Iran Air civilian airliner, killing all 290 on board, including 66 children. The official investigation revealed that the ship's Aegis system – a highly sophisticated command, control, communications, and intelligence center – performed flawlessly. Rather, the design of the user interface played a critical role, as key information and indicators were displayed on smaller screens that had to be called up — this vulnerability in the machine-to-commander link broke down during a high-stress crisis. In fact, there is substantial evidence that human-machine interaction effects are the primary vehicle by which automation software increases the risk of accidents (Mackenzie 1994).
Possible approaches to studying this question: Review how human-machine interactions affected past military accidents like the Vincennes incident, the Patriot fratricides, etc. One possible data source is the FORUM on Risks to the Public in Computers and Related Systems, a community of computer safety researchers. See, for instance, robust discussion on the Vincennes here.
Researchers could also explore how organizations learn from past incidents to reduce the risks of faulty human-machine interactions. For instance, some have pointed out that there has not been another similar incident with Aegis systems in the thirty years since Vincennes (Scharre 2018). How have risky human-machine interactions been dealt with in that time?
Written by Markus Anderljung (Jessica Cussins and Jared Brown likely have much more useful information than I do)
Some have argued (e.g. here and here) that there may be a California Effect in AI; that is, that more stringent Californian AI policies are likely to proliferate to other jurisdictions.
Two points provide evidence in favour of the view:
First, in some regulatory domains, such as environmental standards for cars, has seen a California Effect (see e.g. Ch 8 in Vogel’s 1995 Trading Up), where Californian, more stringent regulation has proliferated across the US. Often, this happens where companies find it cheaper to simply produce one California-compliant product and sell that outside California as well (e.g. due to the cost of maintaining two separate production lines). There is a case to be made that the similar dynamics will be at play in AI. For an excellent description of the dynamics of the California Effect, I’d recommend reading The Brussels Effect by Anu Bradford, which explores the same phenomenon with regards to EU regulation.
Second, California has been the first major jurisdiction to put in place a number of AI-relevant policies in the US, such as the 2018 California Consumer Privacy Act (CCPA), which brings a lot of GDPR-inspired rights to consumers, the 2018 Bot Disclosure Act, and prohibitions on the use of facial recognition via e.g. the Body Camera Accountability Act. California also endorsed the Asilomar AI Principles.
Question to explore:
Written by Jeffrey Ding
China is investing in AI-enabled decision support systems for detecting nuclear attacks. How do Chinese policymakers and analysts view the stability of their nuclear deterrent? How do these views feed into their decisions around investments in new nuclear capabilities?
According to one scholar, there is a big difference over which risks Chinese and American analysts focus on: "In the United States, military analysts are often preoccupied with the concern that alarms or early warning systems, accidentally or even intentionally triggered, could produce false positives. Chinese analysts, in contrast, are much more concerned with false negatives" (Saalman 2018). In contrast, others argue that Chinese forces "prioritize negative control over positive control of nuclear weapons to implement the strict control of the CMC and Politburo over the alerting and use of nuclear weapons" (Cunningham 2019). Here, negative control refers to control against accidental or illegitimate use of nuclear weapons; positive control means control over always being able to execute a legitimate nuclear response.
Little is known about the stability of China’s nuclear deterrent around the world.Consider this passage from Schlosser's excellent book Command and Control (p. 475):
In January 2013, a report by the Defense Science Board warned that the (nuclear command and control) system's vulnerability to a large-scale cyber attack had never been fully assessed. Testifying before Congress, the head of the U.S. Strategic Command, General C. Robert Kehler, expressed confidence that no “significant vulnerability” existed. Nevertheless, he said that an “end-to-end comprehensive review” still needed to be done, that “we don’t know what we don’t know,” and that the age of the command-and-control system might inadvertently offer some protection against the latest hacking techniques. Asked whether Russia and China had the ability to prevent a cyber attack from launching one of their nuclear missiles, Kehler replied, "Senator, I don’t know.”
Possible approaches to studying this question: Studying this would require finding and reading Chinese-language sources on this topic. It would also involve comparative analysis of the safety cultures of the U.S. and Chinese nuclear communities. This is a tough but hugely important research area. As a program officer for Stanley Center for Peace and Security told me, this "is a really tough research area. There aren’t many NC3 [Nuclear Command and Control and Communications] experts anymore, China is a hard research topic for NC3, and cross domain issues makes it even more difficult."
Written by Ben Garfinkel
I would be interested in an investigation into the history of existential risk concerns around nanotechnology and the lessons it might hold for the modern AI risk community.
Background: My impression is that it was not uncommon for futurists in the 1980s and 1990s to believe that transformative nanotech might be imminent and might lead to the extinction of humanity if managed poorly. These concerns also seem to have spread into popular culture, to some extent, and to have been at least a peripheral presence in policy discussions (if only as something that many scientists felt the need to actively distance themselves from). My impression is that there is also significant continuity between the present-day AI-focused long-termist community and the futurist community that was previously highly concerned about nanotechnology. For example, my understanding is that some early work on aligned superintelligence (e.g. by the Singularity Institute) was partly motivated by concern about nanotech risk: some feared that transformative nanotech might arrive soon and largely without warning, might result in extinction by default, and might only be safely manageable if aligned superintelligence is developed first.
Questions I’m interested in:
I think one could make significant progress on these questions just by talking to people who were engaged in (or at least aware of) debates around transformative nanotech in the 1980s or 1990s, including Eric Drexler, Christine Peterson, Eliezer Yudkowsky, and Robin Hanson. It would also be useful to read available histories of nanotechnology and to read essays, news coverage, popular fiction, and mailing list discussions from this period.
Written by Toby Shevlane
For the 2020 conference, the NeurIPS committee introduced a requirement that authors include in their papers a section reflecting upon the broader impact of their work. The idea was to push researchers to consider potential negative societal impacts of AI research (see e.g. Prunkl et al 2021, Ashurst et al 2020, Hecht 2020, Abuhamad & Rheault 2020). For 2021, this requirement is being changed, such that authors instead need to answer a checklist when submitting a paper, with the paper asking whether the paper discusses potential negative impacts (and the authors are free to say no).
These developments could be used as a case study to learn about the pressures that shape institutional change within the AI research community. The project would seek to answer:
The project would involve interviewing NeurIPS organisers, both from the 2020 and 2021 committees.
Written by Toby Shevlane
I am currently writing a book chapter on what I’m provisionally referring to as “structured capability access” (SCA) within AI research. In contrast to open source software, SCA refers to AI developers setting up controlled interactions between the user and the underlying software, with the most obvious example being the way that OpenAI hosts GPT-3 on its API service. SCA must address both safety and security: the user must use the system in a safe way, and they must be prevented from unauthorised modification or reverse engineering of the system. The book chapter focuses on SCA for AI models, but the lens of SCA also applies to access to the cloud computing used to train models.
Methods for structuring access to certain capabilities are not unique to AI. One interesting example is the printing of DNA sequences, carried out by certain biology labs. There are procedures for these labs screening requests; and also proposals for the printing hardware to be sold with certain “locks” on what can be printed (Esvelt, 2018).
This research project would explore in detail the systems that exist within synthetic biology, in order to learn lessons for how SCA could be further developed within AI. Important sub-questions would be:
It would be beneficial if the researcher had some existing familiarity with biology, but this might not be necessary.
Written by Markus Anderljung
To what extent does high tech development, including AI, have similar dynamics to financial bubbles? Shiller (in e.g. Irrational Exuberance) says that bubbles in financial markets are driven by investors' beliefs about other investors’ beliefs and poor feedback loops with reality. Investor behaviour is thus often driven by narratives, which can be undermined suddenly if/when a strong counter-narrative takes hold. It seems plausible that the same dynamics exist in the high tech space. They might even be stronger. Factors in favour include high information asymmetries, a lot of actors with incentives to contribute to hype-narratives, and poor feedback loops with reality (research needs a lot of time to turn into profits). On the other hand, moving capital in the high tech space is much harder than in finance (changing career track takes several years).
Specific questions I’m interested in:
Written by Markus Anderljung & Alex Lintz
What can we learn from the self-governance mechanisms put in place by the AI community and AI companies in the last decade? Notable examples to study include: the Asilomar Conference on Beneficial AI, the Facebook Oversight Board (see Klonick 2020), AI ethics boards (interesting examples include DeepMind's ethics board, Google’s defunct AI advisory board, Microsoft’s AETHER committee), AI ethics principles put out by a huge number of entities, the Partnership on AI, shifts in publication norms (see e.g. Prunkl et al 2021, Partnership on AI 2021, and Shevlane & Dafoe 2020), and companies supporting the AI ethics field (e.g. by sponsoring conferences, research, and setting up internal teams), OpenAI’s Charter and move to a capped-profit model. Some relevant work on this topic from a broadly longtermist perspective includes forthcoming work by Cihon, Schuett, and Baum, Peter Cihon’s work on AI standards, Caroline Meinhardt’s forthcoming work on corporate AI ethics in China, Jia Yuan Loke’s EA Forum post, Will Hunt’s work on safety-critical AI in aviation, and Jessica Newman’s work evaluating some existing attempts at AI governance.
For each attempt, you might ask questions like the following:
Ultimately, you should aim for this research to be able to inform questions like: If you were in control of e.g. Google, what corporate self-governance mechanisms would you put in place in order to ensure the company behaves in a socially responsible way in the face of radical technological change? What, if any, mechanisms should advocates outside companies push them to adopt?
Written by Alex Lintz
While self-governance is important, its secondary effects could be even more so. In particular, improved self-governance might influence the quality or quantity of regulation. For example, it is not yet clear what the most important impact of something like the Facebook Oversight Board (see e.g. Klonick 2020 for details) will be. Will it hold regulators at bay by satisficing governance needs? Might it increase Facebook’s desire for regulation which forces competitors to act within the constraints Facebook has already subjected itself to? Will it provide an example for regulators to learn from, thus improving future regulation? Understanding the relationship between self-governance and regulation may help us to understand where to target our efforts. For example, should we push hard for responsible corporate self-governance first or would better regulation (or the threat of it) improve self-governance anyway?
One approach which might shed light on these questions is to evaluate past cases of emerging industries and trace the path of their evolution to more responsible governance (or else their failure to become responsible). In those cases, did better governance start among firms and then lead to regulation, involve little in the way of self-governance, involve deceptive self-governance (e.g. the tobacco industry), or otherwise fail to achieve responsible governance? Cases should ideally be selected in part by their similarity to current attempts at regulating AI (for example, those listed in other parts of this document). Jia Yuan Locke provides one potentially useful framework for selecting relevant industries. Another might be to look from a national security lens at what Jeff Ding identifies as strategic technologies, focusing on how companies engage with e.g. threats of export controls being put in place.
For each case, you might ask questions like the following:
Written by Ben Garfinkel
Written by Markus Anderljung
A review paper summarizing the state of the literature on strategic-trade theory and related ideas (e.g. “industrial policy”, “high-development theory”). For example, I would like to have a better sense of the magnitudes of the rents that a country can get from having an industrial champion. How much value does the existence of Airbus provide to Europe? How much of modern wealth comes from strategic industries? One estimate puts the commercial aircraft market at roughly $100bn per year (which is ~0.5% of US or EU GDP). Ultimately, I’m interested in this research informing questions like: Do states promote their national interest by attempting to create national AI champions? How does one country having a national AI champion affect the national interests of other states? These questions can help inform us on how strongly incentivised states will be to pursue ambitious AI industrial policy approaches and whether there is a strong national interest-case against doing so.
Potential sources
Written by Markus Anderljung
Human society is built around the fact that human minds are opaque. What would happen if improvements in machine learning and various sensors such as brain scanning technologies and wearable biometric readers could make people’s thoughts and emotions more transparent? Such transparency could have significant impacts on the world, reshaping diplomacy, surveillance, criminal justice, the economy, politics, and interpersonal relationships.
What capabilities are we likely to see?
To my limited knowledge, there are currently no particularly impressive mind reading technologies. Polygraph tests are notoriously inaccurate, several companies (e.g. Affectiva) are building visual recognition systems for emotion (though they’ve faced skepticism and don’t appear particularly impressive to me), and new products with limited accuracy are starting to be rolled out in workplaces and schools (see e.g. Giattino et al 2019 for a summary). However, progress in high-bandwidth human-computer interfaces by companies like Neuralink and advances in reconstructing mental content from e.g. fMRI data (e.g. Shen et al 2019, Hassabis et al 2014), suggest that more impressive technologies are on the horizon.
To make progress on the question, you can ask:
Forecasting mind reading technologies
The potential impact of mind reading technologies
A note on terminology: Folks in the longtermist / EA community often refer to this as “lie detection technology”. I have a preference for “mind reading technology” as the transformative impacts may also come from being able to detect emotions or thoughts not expressed. It does come with the problem of sounding perhaps a bit too futuristic.
Keep in mind that this space could be rife with information hazards, in particular if it turns out that mind reading technologies are on net a bad development and they are feasible.
Written by Markus Anderljung (you might also want to reach out to Miles Brundage, Shahar Avin, or Saif Khan if you’re interested in these topics)
Compute is a very promising node for AI governance. Why? Powerful AI systems in the near term are likely to need massive amounts of compute, especially if the scaling hypothesis proves correct. Furthermore, compute seems more easily governable than other inputs to AI systems (talent, ideas, data), because it is more easily detectable (it requires energy, takes up physical space, etc) and because it’s supply chain is very concentrated (which enables monitoring and governance) (see Khan, Mann, Peterson 2021, Avin unpublished, and Brundage forthcoming).
Some have called for governments to create compute funds, where some (e.g. academics, those working on AI for Good-applications, or AI safety researchers) are given preferential or exclusive access. This might be implemented via credits researchers can spend with compute providers or via the government setting up its own domestic cloud computing infrastructure. In the US, the National Defense Authorization Act 2021 included a provision that a National AI Research Cloud task force should explore whether to set up an AI research cloud, providing both compute and datasets (summary here) and the National Security Commission on AI recommended the creation of a similar entity (p.191 here). The EU and the UK are also in various stages of considering similar initiatives.
Should governments set up such funds? Seeing as they are likely to be set up, how should they be designed? Concrete questions to explore:
If compute is a particularly promising node of AI governance, we might expect compute providers to be particularly important. What kinds of governance activities (e.g. related to monitoring and use restrictions) would we like compute providers (e.g. semiconductor companies and cloud compute providers) to engage in? How can we move towards a world where they take these actions?
Concrete questions:
The aspect of compute governance that has been explored in by far the greatest detail (notably by folks at the Centre for Security and Emerging Technology, CSET) concerns whether the US and allies should attempt to (i) reduce the proliferation of the ability to produce cutting edge chips, with a focus on its spread to China, and (ii) put in place export controls for cutting edge chips for certain uses.
My summary of the core claims coming out of CSET’s research (more in Khan, Mann, & Peterson 2021 and Khan 2021):
While these actions come with various risks, it may also provide a number of benefits:
I’m interested in questions like:
(h/t Jade Leung)
Compute providers (cloud compute companies and hardware companies) are likely an influential actor in the compute governance space. It is important to understand them better and what beneficial actions they could take.
[1] On nuclear strategy the classic text is: Lawrence Freedman, The Evolution of Nuclear Strategy (Palgrave Macmillan, 1981) (the fourth edition was issued in 2019). There are many more studies, for example: Michio Kaku and Daniel Axelrod, To Win a Nuclear War (Montreal: Black Rose Books, 1987); Francis J. Gavin, Nuclear Statecraft: History and Strategy in America's Atomic Age (Ithaca: Cornell University Press, 2012); Edward N. Luttwak, Strategy and History: Collected Essays volume two (New Brunswick: Transaction Books, 1985); Richard K. Betts, Nuclear Blackmail and Nuclear Balance (Washington, DC: The Brookings Institution, 1987); Edward Kaplan, To Kill Nations: American Strategy in the Air-Atomic Age and the Rise of Mutually Assured Destruction (Ithaca: Cornell University Press, 2015).
[2] There are a few classic studies: Peter Paret, Gordon A. Craig, Felix Gilbert (eds.), Makers of Modern Strategy from Machiavelli to the Nuclear Age (Oxford: OUP, 1986); Fred Kaplan, The Wizards of Armageddon (Stanford: Stanford University Press, 1983); Gregg Herken, Counsels of War (New York: Knopf, 1985). Other early studies include: Roy E. Ricklider, The Private Nuclear Strategists (Ohio State University Press, 1971). Recent studies include: Ron Robin, The Cold World They Made: The Strategic Legacy of Roberta and Albert Wohlstetter (Cambridge: Harvard University Press, 2016); Alex Abella, Soldiers of Reason: The RAND Corporation and the Rise of the American Empire (Orlando: Harcourt, 2008). Individual biographies include: Robert Ayson, Thomas Schelling and the Nuclear Age: Strategy as Social Science (London: Frank Cass, 2004); Robert Dodge, The Strategist: The Life and Times of Thomas Schelling (Hollis Publishing, 2006); Barry H. Steiner, Bernard Brodie and the Foundations of American Nuclear Strategy (Lawrence: University Press of Kansas, 1991); Barry Scott Zellen, Bernard Brodie, The Bomb, and the Birth of the Bipolar World (New York: Continuum, 2012); Sharon Ghamari-Tabrizi, The Worlds of Herman Kahn: the intuitive science of thermonuclear war (Cambridge: Harvard University Press, 2005). One innovative study which looks at “amateur strategists” is: James DeNardo, The Amateur Strategist: Intuitive Deterrence Theories and the Politics of the Nuclear Arms Race (Cambridge: Cambridge University Press, 1995).
[3] Most prominently: Daniel Ellsberg, The Doomsday Machine: Confessions of a Nuclear War Planner (New York: Bloomsbury, 2017).
[4] By “take responsibility for computation done on their systems”, I am mainly referring to their taking actions to avoid certain computations being done on their systems. Whether they are legally or morally responsible for such computations is only relevant, in my mind, insofar as it changes the extent to which such actions are taken.
[5] This route has been explored by Denise Melchin and Shahar Avin in unpublished work.