Industry is attempting to stave off regulation, but large-scale AI needs more scrutiny, not less.

Large-scale General Purpose AI models (such as GPT-4 and its user-facing application chatGPT) are being promoted by industry as “foundational” and a major turning point for scientific advancement in the field. They are also often associated with slippery definitions of “open source.” 

These narratives distract from what we call the “pathologies of scale” that become more entrenched every day: large-scale AI models are still largely controlled by Big Tech firms because of the enormous computing and data resources they require, and also present well-documented concerns around discrimination, privacy and security vulnerabilities, and negative environmental impacts.

Large-scale AI models like Large Language Models (LLMs) have received the most hype, and fear-mongering, over the past year. Both the excitement and anxiety1Future of Life Institute. “Pause Giant AI Experiments: An Open Letter.” Accessed March 29, 2023; Harari, Yuval, Tristan Harris, and Aza Raskin. “Opinion | You Can Have the Blue Pill or the Red Pill, and We’re Out of Blue Pills.” The New York Times, March 24, 2023, sec. Opinion. around these systems serve to reinforce the notion that these models are ‘foundational’ and a major turning point for advancement in the field, despite manifold examples where these systems fail to provide meaningful responses to prompts.2See Greg Noone, “‘Foundation models’ may be the future of AI. They’re also deeply flawed,” Tech Monitor, November 11, 2021 (updated February 9, 2023); Dan McQuillan, “We Come to Bury ChatGPT, Not to Praise It,”, February 6, 2023; Ido Vock, “ChatGPT Proves That AI Still Has a Racism Problem,” New Statesman, December 9, 2022; Janice Gassam Asare, “The Dark Side of ChatGPT,” Forbes, January 28, 2023; and Billy Perrigo, “Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic,” Time, January 18, 2023. But the narratives associated with these systems distract from what we call the ‘pathologies of scale’ that this emergent framing serves to both highlight and distract from. The term “foundational,” for example, was introduced by Stanford University when announcing a new center of the same name in early 20223See the Center for Research on Foundation Models, Stanford University; and Margaret Mitchell (@mmitchell_ai), “Reminder to everyone starting to publish in ML: ‘Foundation models’ is *not* a recognized ML term; was coined by Stanford alongside announcing their center named for it; continues to be pushed by Sford as *the* term for what we’ve all generally (reasonably) called ‘base models’,” Twitter, June 8, 2022, 4:01 p.m., in the wake of the publication of an article listing the many existential harms associated with LLMs.4Emily Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, March 2021. In notably fortuitous timing, the introduction of these models as “foundational’’ aimed to equate them (and those espousing them) with unquestionable scientific advancement, a stepping stone on the path to “Artificial General Intelligence”5See Sam Altman, “Planning for AGI and beyond”, March 2023. (another fuzzy term evoking science-fiction notions of replacing or superseding human intelligence) thereby making their wide-scale adoption inevitable.6See National Artificial Intelligence Research Resource Task Force, “Strengthening and Democratizing the U.S. Artificial Intelligence Innovation Ecosystem: An Implementation Plan for a National Artificial Intelligence Research Resource,” January 2023; and Special Competitive Studies Project, “Mid-Decade Challenges to National Competitiveness,” September 2022. These discourses have since returned to the foreground following the launch of Open AI’s newest LLM-based chatbot, chatGPT. 

On the other hand, the term “General Purpose AI” (GPAI) is being used in policy instruments like the EU’s AI Act to underscore that these models have no defined downstream use and can be fine-tuned to apply in specific contexts.7The EU Council’s draft or “general position” on the AI Act text defines General Purpose AI (GPAI) as an AI system that “that – irrespective of how it is placed on the market or put into service, including as open source software – is intended by the provider to perform generally applicable functions such as image and speech recognition, audio and video generation, pattern detection, question answering, translation and others; a General Purpose AI system may be used in a plurality of contexts and be integrated in a plurality of other AI systems.” See Council of the European Union, “Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts – General Approach,” November 25, 2022; see also Future of Life Institute and University College London’s proposal to define GPAI as an AI system “that can accomplish or be adapted to accomplish a range of distinct tasks, including some for which it was not intentionally and specifically trained.” Carlos I. Gutierrez, Anthony Aguirre, Risto Uuk, Claire C. Boine, and Matija Franklin, “A Proposal for a Definition of General Purpose Artificial Intelligence Systems,” Future of Life Institute, November 2022. It has been wielded to make arguments such as because these systems lack clear intention or defined objectives, they should be regulated differently or not at all – effectively creating a major loophole in the law (more on this in Section 2 below)8 Alex C. Engler, “To Regulate General Purpose AI, Make the Model Move,” Tech Policy Press, November 10, 2022. 

Such terms deliberately obscure another fundamental feature of these models: they currently require computational and data resources at a scale that ultimately only the most well-resourced companies can afford to sustain.9See Ben Cottier, “Trends in the dollar training cost of machine learning systems”, Epoch, January 31, 2023; Jeffrey Dastin and Stephen Nellis, “For tech giants, AI like Bing and Bard poses billion-dollar search problem”, Reuters, February 22, 2023; Jonathan Vanian and Kif Leswing, “ChatGPT and generative AI are booming, but the costs can be extraordinary”, CNBC, March 13, 2023; Dan Gallagher, “Microsoft and Google Will Both Have to Bear AI’s Costs”, WSJ, January 18, 2023;  Christopher Mims, “The AI Boom That Could Make Google and Microsoft Even More Powerful,Wall Street Journal, February 11, 2023; and Diane Coyle, “Preempting a Generative AI Monopoly,Project Syndicate, February 2, 2023.. For a sense of the figures, some estimates suggest it will cost 3 million dollars a month to run chatGPT.10See Tom Goldstein (@tomgoldsteincs), “I estimate the cost of running chatGPT is $100K per day, or $3M per month. This is a back-of-the-envelope calculation. I assume nodes are always in use with a batch size of 1. In reality they probably batch during high volume, but have GPUs sitting fallow during low volume,” Twitter, December 6, 2022, 1:34 p.m; and MetaNews, “Does ChatGPT Really Cost $3M a Day to Run?” December 21, 2022 and 20 million dollars in computing costs to train Pathways Language Model (PaLM), a recent LLM from Google11 Lennart Heim, “Estimating 🌴PaLM’s training cost,” April 5, 2022; Peter J. Denning and Ted G. Lewis, “Exponential Laws of Computing Growth,” Communications of the ACM 60, no. 1 (January 2017):54–65. Currently only a handful of companies with incredibly vast resources are able to build them. That’s why the majority of existing large-scale AI models have been almost exclusively developed by Big Tech, especially Google (Google Brain, Deepmind), Meta, and Microsoft (and its investee OpenAI). This includes many off-the-shelf, pretrained AI models that are offered as part of cloud AI services, a market already concentrated in Big Tech players, such as AWS (Amazon), Google Cloud (Alphabet), and Azure (Microsoft). Even if costs are lower or come down as these systems are deployed at scale (and this is a hotly contested claim12Andrew Lohn and Micah Musser, “AI and Compute”, Center for Security and Emerging Technology), Big Tech is likely to retain a first mover advantage, having had the time and market experience needed to hone their underlying language models and to develop invaluable in-house expertise. Smaller businesses or start ups may consequently struggle to successfully enter this field, leaving the immense processing power of LLMs concentrated in the hands of a few Big Tech firms.13Richard Waters, “Falling costs of AI may leave its power in hands of a small group”, Financial Times, March 9, 2023.  

This market reality cuts through growing narratives that highlight the potential for “open-source” and “community or small and medium enterprise (SME)-driven” GPAI projects or even the conflation of GPAI as synonymous with open source (as we’ve seen in discussions around the EU’s AI Act).14Ryan Morrison, “EU AI Act Should ‘Exclude General Purpose Artificial Intelligence’ – Industry Groups,” Tech Monitor, September 27, 2022. In September 2022, for example, a group of ten industry associations led by the Software Alliance (or BSA) published a statement opposing the inclusion of any legal liability for the developers of GPAI models.15See BSA | The Software Alliance, “BSA Leads Joint Industry Statement on the EU Artificial Intelligence Act and High-Risk Obligations for General Purpose AI,” press release, September 27, 2022, ; and BSA, “Joint Industry Statement on the EU Artificial Intelligence Act and High-Risk Obligations for General Purpose AI,” September 27, 2022. Their headline argument was that this would “severely impact open source development in Europe” as well as “undermine AI uptake, innovation, and digital transformation.”16BSA, “BSA Leads Joint Industry Statement on the EU Artificial Intelligence Act and High-Risk Obligations for General Purpose AI.” The statement leans on hypothetical examples that present a caricature of both how GPAI models work and what regulatory intervention would entail—the classic case cited is of an individual developer creating an open source document-reading tool and being saddled by regulatory requirements around future use cases it can neither predict nor control. 

The discursive move here is to conflate “open source,” which has a specific meaning related to permissions and licensing regimes, with the intuitive notion of being “open” in that they are accessible for downstream use and adaptation (typically through Application Programming Interfaces, or APIs). The latter is more akin to “open access,” though even in that sense they remain limited since they only share the API, rather than the model or training data sources.17Peter Suber, Open Access (Cambridge, MA: MIT Press, 2019). In fact, in OpenAI’s paper announcing its GPT-4 model, the company said it would not provide details about the architecture, model size, hardware, training compute, data construction or training method used to develop GPT-4, other than noting it used its Reinforcement Learning from Human Feedback approach, asserting competitive and safety concerns. Running directly against the current push to increase firms’ documentation processes,18Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, Timnit Gebru, “Model Cards for Model Reporting,” arXiv, January 14, 2019; Emily Bender and Batya Friedman, “Data Statements for Natural Language Model Processing: Toward Mitigating System Bias and Enabling Better Science”, Transactions of the Association for Computational Linguistics, 6 (2018): 587-604; Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford, “Datasheets for Datasets.” Communications of the ACM 64, no. 12 (2021): 86-92. such moves compound what has already been described as a reproducibility crisis in machine learning-based science, in which claims about the capabilities of AI-based models cannot be validated or replicated by others.19Sayash Kapoor and Arvind Narayanan. “Leakage and the Reproducibility Crisis in ML-based Science.” arXiv, July 14, 2022.

Ultimately, this form of deployment only serves to increase Big Tech firms’ revenues and entrench their strategic business advantage.20A report by the UK’s Competition & Markets Authority (CMA) points to how Google’s “open” approach with its Android OS and Play Store (in contrast to Apple’s) proved to be a strategic advantage that eventually led to similar outcomes in terms of revenues and strengthening its consolidation over various parts of the mobile phone ecosystem. See Competition & Markets Authority, “Mobile Ecosystems: Market Study Final Report,” June 10, 2022. While there are legitimate reasons to consider potential downstream harms associated with making such systems widely accessible21Arvind Narayanan and Sayash Kapoor, “The LLaMA is Out of the Bag. Should We Expect a Tidal Wave of DIsinformation?Knight First Amendment Institute (blog), March 6, 2023., even when projects might make their code publicly available and meet other definitions of open source, the vast computational requirements of these systems mean that dependencies between these projects and the commercial marketplace will likely persist.22See Coyle, “Preempting a Generative AI Monopoly.” 

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? By Dr. Emily M. Bender, Dr. Timnit Gebru, Angelina McMillan-Major, and Dr. Margaret Mitchell

“Are ever larger LMs inevitable or necessary? What costs are associated with this research direction and what should we consider before pursuing it?”

In 2021, Dr. Emily M. Bender, Dr. Timnit Gebru, Angelina McMillan-Major, and Dr. Margaret Mitchell warned against the potential costs and harms of large language models (LLMs) in a paper titled On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?.23Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. FAccT ’21. New York, NY, USA: Association for Computing Machinery, 2021. The paper led to Google forcing out both Gebru and Mitchell from their positions as the co-leads of Google’s Ethical AI team.24Metz, Cade, and Daisuke Wakabayashi. “Google Researcher Says She Was Fired Over Paper Highlighting Bias in A.I.The New York Times, December 3, 2020, sec. Technology. 

This paper could not have been more prescient in identifying pathologies of scale that afflict LLMs. As public discourse is consumed by breathless hype around chatGPT and other LLMs as an unarguable advancement in science, this research offers sobering reminders of the serious concerns that afflict these kinds of models. Rather than uncritically accept these technologies as synonymous with progress, the arguments advanced in the paper raise existential questions to if, not how, society should be building them at all. The key concerns raised in the paper are as follows: 

Environmental and Financial Costs

LLMs are hugely energy intensive to train and produce large CO2 emissions. Well-documented environmental racism means that marginalized people and people from the Majority World/Global South are more likely to experience the harms caused by heightened energy consumption and CO2 emissions even though they are also least likely to experience the benefits of these models. Additionally, the high cost of entry and training these models means that only a small global elite is able to develop and benefit from LLMs. They argue that environmental and financial costs should become a top consideration in Natural Language Processing (NLP) research. 

Unaccountable Training Data 

“In accepting large amounts of web text as ‘representative’ of ‘all’ of humanity we risk perpetuating dominant viewpoints, increasing power imbalances, and further reifying inequality.” 

The use of large and uncurated training data sets risks creating LLMs that entrench dominant, hegemonic views. The large size of these training data sets does not guarantee diversity, as they are often scraped from websites that exclude the voices of marginalized people due to issues such as inadequate Internet access, underrepresentation, filtering practices, or harassment. These data sets run the risk of ‘value-lock’ or encoding harmful bias into LLMs that are difficult to thoroughly audit. 

Creating Stochastic Parrots

Bender et al. further warn that the pursuit of LLM benchmarks may be a misleading direction for research, as these models have access to form, but not meaning. They observe that “an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot”. As stochastic parrots, these models are likely to absorb hegemonic worldviews from their training data and produce outputs that contain both subtle forms of stereotyping and outright abusive language. They can also lead to harms based on translation errors, and through their misuse by bad actors to create propaganda, spread misinformation, and deduce sensitive information.

Large scale AI models must be subject to urgent regulatory scrutiny, particularly given the frenzied speed of rollout to the public. Documentation and scrutiny of data and related design choices at the stage of model development is key to surfacing and mitigating harm. 

It’s not a blank slate. Legislative proposals on Algorithmic Accountability must be expanded and strengthened and existing legal tools should be creatively applied to introduce friction and shape the direction of innovation.    

There is growing exceptionalism around generative AI models that underplays inherent risks and justifies their exclusion from the purview of AI regulation. We should draw lessons from the ongoing debate in Europe on the inclusion of General Purpose AI under the “high risk” category of the upcoming AI Act. 

Along with breathless hype around the future potential of AI, the release of chatGPT (and its subsequent adaptation into Microsoft’s search chatbot) immediately surfaced thorny legal questions, such as, who owns and has rights over the content generated by these systems?25James Vincent, “The scary truth about AI copyright is that nobody knows what will happen next”, The Verge, November 15, 2022. Is generative AI protected from lawsuits relating to illegal content they might generate under intermediary liability protections like Section 230?26Electronic Frontier Foundation, “Section 230”, Electronic Frontier Foundation, n.d.

What’s clear is that there are already existing legal regimes that apply to large language models, and we aren’t building them from the ground up. In fact, rhetoric that implies this is necessary works mostly to industry’s best interests, by slowing the paths to enforcement and updates to the law. 

A blog post recently published by the FTC outlined several ways the Agency’s authorities already apply to generative AI systems: if they’re used for fraud, cause substantial injury, or make false claims about the system’s capabilities the FTC has cause to step in. There are many other domains where other legal regimes are likely to apply: intellectual property law, anti-discrimination provisions, and cybersecurity regulations are among them.

There’s also a forward looking question of what norms and responsibilities should apply to these systems. The growing consensus around recognized harms from AI systems (particularly inaccuracies, bias, and discrimination) has led to a flurry of policy movement over the last few years centering around greater transparency and diligence around data and algorithmic design practices. These emerging AI policy approaches will need to be strengthened to address the particular challenges these models bring up, and the current public attention on AI is poised to galvanize momentum where it’s been lacking. 

In the EU, this question is not theoretical. It is at the heart of a hotly contested debate about whether the original developers of so-called “General Purpose AI” (GPAI) models should be subject to the regulatory requirements of the upcoming AI Act.27Creative Commons, “As European Council Adopts AI Act Position, Questions Remain on GPAI”, Creative Commons, December 13, 2022; Corporate Europe Observatory, “The Lobbying Ghost in the Machine: Big Tech’s covert defanging of Europe’s AI Act”, February 2023; Gian Volpicelli, ‘ChatGPT broke the EU plan to regulate AI’, Politico, March 3. Introduced by the European Commission in April 2021, the Commission’s original proposal (Article 52a) effectively exempted the developers of GPAI from complying with the range of documentation and other accountability requirements in the law.28European Commission, “Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts,” April 21, 2021. This would therefore mean that GPAI that ostensibly had no predetermined use or context would not qualify as ‘high risk’ – another provision (Article 28) confirmed this position, implying stating that developers of GPAI would only become responsible for compliance if they significantly modified or adapted the AI system for high-risk use. The European Council’s position took a different stance where original providers of GPAI will be subject to certain requirements in the law, although working out the specifics of what these would be delegated to the Commission. Recent reports suggest that the European Parliament, too, is considering obligations specific to original GPAI providers.

As the inter-institutional negotiation in the EU has flip-flopped on this issue the debate seems to have devolved into an unhelpful binary where either end users or original developers take on liability29An article by Brookings Fellow Alex Engler, for example, argues that regulating downstream end users makes more sense because “good algorithmic design for a GPAI model doesn’t guarantee safety and fairness in its many potential uses, and it cannot address whether any particular downstream application should be developed in the first place.” See Alex Engler, “To Regulate General Purpose AI, Make the Model Move”, Tech Policy Press, November 10, 2022; See also Alex Engler, “The EU’s attempt to regulate general purpose AI is counterproductive”, Brookings, August 24, 2022., rather than both having responsibilities of different kinds at different stages.30The Mozilla Foundation’s position paper on GPAI helpfully argues in favor of joint liability. See Maximilian Gahntz and Claire Pershan, “Artificial Intelligence Act: How the EU Can Take on the Challenge Posed by General-Purpose AI Systems,” Mozilla Foundation, 2022. And a recently leaked unofficial US government position paper reportedly states that placing burdens on original developers of GPAI could be “very burdensome, technically difficult and in some cases impossible.”31Luca Bertuzzi, “The US Unofficial Position on Upcoming EU Artificial Intelligence Rules,” Euractiv, October 24, 2022.

These accounts lose sight of the two most important reasons that large-scale AI models require oversight:

  • Data and design decisions made at the developer stage determine many of the model’s most harmful downstream impacts, including the risks of bias and discrimination.32Sasha Costanza-Chock, Design Justice: Community-Led Practices to Build the Worlds We Need. Cambridge: MIT Press. There is mounting research and advocacy that argues for the benefits of rigorous documentation and accountability requirements on the developers of large-scale models.33See Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford, “Datasheets for Datasets,” arXiv:1803.09010, December 2021, ; Mehtab Khan and Alex Hanna, “The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability,” Ohio State Technology Law Journal, forthcoming, accessed March 3, 2023, ; and Bender, Gebru, McMillan-Major, and Shmitchell, “On the Dangers of Stochastic Parrots.”
  • The developers of these models, many of which are Big Tech or Big Tech-funded, commercially benefit from these models through licensing deals with downstream actors.34See for example Madhumita Murgia, “Big Tech companies use cloud computing arms to pursue alliances with AI groups”, Financial Times, February 5, 2023; Leah Nylen and Dina Bass, “Microsoft Threatens Data Restrictions In Rival AI Search”, Bloomberg, March 25, 2023; OpenAI, Pricing; Jonathan Vanian, “Microsoft adds OpenAI technology to Word and Excel”, CNBC, March 16, 2023; and Patrick Seitz, “Microsoft Stock Breaks Out After Software Giant Adds AI To Office Apps”, Investor’s Business Daily, March 17, 2023. Companies licensing these models for specific uses should certainly be accountable for conducting diligence within the specific context in which these models are applied, but to make them wholly liable for risks that emanate from data and design choices made at the stage of original development would result in both unfair and ineffective regulatory outcomes.