Public debates about an “FDA for AI” often operate in a broad analogical manner, asking what a federal agency regulating AI would look like. Yet this is a blunt instrument for a conversation deserving of greater nuance.
As a productive starting point for our analysis, we chose to identify the key regulatory functions for the FDA and assess their relevance for AI. Using thematic coding, we bucketed these regulatory functions into three groupings, analyzing how each functions within the context of the FDA, and how each has considerations for artificial intelligence—points of alignment, departure, or other factors that surfaced from engaging with the FDA example. We’ve summarized these takeaways below, and offer a more thorough accounting in Appendix 2.
I. Premarket Approval
What is it?
“Premarket approval” refers to the FDA’s role in scrutinizing and approving (or refusing to approve) prescription drugs before they enter widespread circulation. This process gives the FDA control over how a drug can be marketed—both to the public and to the physicians who prescribe drugs to the public.1Center for Drug Evaluation and Research, “Prescription Drug Advertising | Questions and Answers,” Food and Drug Administration, updated June 19, 2015; accessed July 10, 2024, https://www.fda.gov/drugs/prescription-drug-advertising/prescription-drug-advertising-questions-and-answers. Much of the FDA’s power therefore lies in its gatekeeping function. It controls a key gateway through which pharmaceuticals need to pass in order to enter the market, providing strong incentives for companies to comply with the approvals process.
As the central regulatory authority responsible for evaluating drugs before they are marketed, the FDA examines both the safety of drugs and their efficacy, making an approval determination based on a risk-benefit analysis.
These two qualities—safety and efficacy—do not exist in full isolation from each other, and at times may in fact be in tension. A more effective drug may introduce increased risks of side effects, and regulators must evaluate whether the benefits of tackling a specific disease outweigh the ancillary harms that are introduced. Only by considering safety and efficacy together is it possible for the agency to make meaningful assessments of whether the drug should be allowed on the market.
In the pharmaceutical context, the FDA uses a benchmarking process that is both flexible and standardized in the form of end points: evaluative metrics tailored to an indication of disease that are not only valid and generalizable, but also reflect a particular outcome being measured. End points are established in agreement between FDA staff and drug manufacturers, and form a key part of the clinical trial process as determinants of whether the drug has successfully achieved a stated health outcome.2Charlie McLeod et al., “Choosing Primary Endpoints for Clinical Trials of Health Care Interventions,” Contemporary Clinical Trials Communications 16 (December 2019): 100486, https://doi.org/10.1016/j.conctc.2019.100486. Surrogate end points serve as proxies, metrics that are closely linked to more traditional end points but that may enable swifter evaluation by substituting a short-term outcome for a long-term one. (For example, one workshop participant referenced reduction in tumor size as an example of a surrogate end point that is clinically verifiable on a shorter time frame than seeing a patient’s cancer go into remission).3Ibid.
Regardless of the pathway used, the premarket approach to drug approvals necessitates that drug manufacturers document their reasoning and decision-making throughout the development process. This feeds into another important function of the FDA—that of producing information and expertise—which we discuss at greater length below.
How could it work for AI?
At present, with few exceptions, many artificial intelligence systems do not go through any kind of standardized evaluation process prior to entering commercial deployment in the United States (though some AI systems do go through post-market evaluation; see Appendix 1). Under a “permissionless innovation” approach, regulators have taken a light-touch approach to the market, tending to step in ex post to correct harms after they have occurred.4Darrell M. West, “The End of Permissionless Innovation,” Brookings, October 7, 2020, https://www.brookings.edu/articles/the-end-of-permissionless-innovation.
In practice, this ex post approach to regulatory enforcement is often triggered by negative press coverage, independent auditing by researchers or journalists, whistleblowing, or consumer reporting. These triggers can lead enforcement agencies to open up an investigation, evaluate compliance with the contours of existing law, and, if merited, institute penalties for failure to comply with legal mandates.
This approach has a number of weaknesses: it tasks under-resourced actors with the most onerous elements of accountability. It means that redress is often unevenly distributed, since garnering sufficient attention to merit redress often requires access to public platforms. It means that evaluations of legal compliance are often ad hoc and highly context-specific, in the absence of more objective measures carried out across the board. And often, the enforcement process occurs long after the harm has been incurred, too late to effectively remediate.
The FDA’s premarket review for drugs offers useful insights for the AI policy community to consider:
- Premarket review places the burden on the manufacturer to prove a product is safe, rather than on the public or enforcement agencies to identify instances where harm has occurred.
- The level of flexibility adopted in the FDA model could offer a mechanism through which the rigor of the evaluation process is calibrated to the specific domain of deployment, mitigating assertions that FDA-style regulation is overly onerous and expensive. For example, the end-point approach not only offers flexibility around what counts as proof of “safety and efficacy” but also shapes what type of information the FDA can ask for. This would also enable the agency to appropriately calibrate the standard-setting process to account for contextual factors important to how AI is developed and deployed in the world, and the dynamic nature of system development.
- Rather than engaging fixed standards for evaluation of products already on the market, the FDA approach incentivizes companies to invest in the development of evaluation measures well attuned to their systems in order to “show their work.”
In contrast to the “red-teaming” approach that AI companies have tended to favor, this premarket approach mandates that firms do more than say, “Trust us, we’ve done our homework.” Red teaming is generally a black-boxed process. It looks solely at areas of risk or vulnerability rather than efficacy, and leaves it to companies to define the metrics and processes through which their products are tested. Moreover, red teaming does not substantiate safety, functionality, or efficacy claims, which are all required when undergoing any regulatory process. By contrast, the approach to clinical testing utilized by the FDA encourages greater transparency and documentation of development practices, raises the bar for standard setting and benchmarking while maintaining scope for flexibility attuned to context, and enables empirical evaluation not only of harm but of whether a system works in the first place.
An important consideration is the complexity of AI supply chains, which often string together multiple service providers or firms with multiple use cases. The FDA has a process for quality control across the entirety of the supply chain, including engaging with some regulation at the level of the suppliers of components of drugs;5This regulatory process takes place via drug master files (DMFs), which may be instructive for regulating foundation models. See Food and Drug Administration, “Drug Master Files (DMFs),” November 3, 2023, https://www.fda.gov/drugs/forms-submission-requirements/drug-master-files-dmfs. its enforcement, however, is most intense at the level of a particular end application, where validating the safety and effectiveness of a given product is more manageable.
The FDA regulatory model may not be as well suited to points in the supply chain that are more distant from the context of deployment, such as the base or “foundation model” layer. It is difficult to evaluate a general-purpose system for safety and efficacy because the full range of use cases is unknown. Here, other regulatory design approaches—such as financial regulation and its treatment of systemic risk,6See Julia Black and Andrew Murray, “Regulating AI and Machine Learning: Setting the Regulatory Agenda,” European Journal of Law and Technology 10, no. 3 (2019), https://www.ejlt.org/index.php/ejlt/article/view/722; and Carsten Jung and Bhargav Srinivasa Desikan, “Artificial Intelligence for Public Value Creation: Introducing Three Policy Pillars for the UK AI Summit,” IPPR, October 25, 2023, https://www.ippr.org/articles/ai-for-public-value-creation. or emissions monitoring regulation—may offer more useful corollaries.
There are nevertheless safety processes, testing, and documentation that can be mandated for all points along the supply chain. At a minimum, mandates for clear documentation of base models, including the data used to train them, will be necessary to enable evaluation at the application layer. (See the Producing Information and Expertise section for more on this.)
II. Post-Market Monitoring and Enforcement
What is it?
The FDA recognizes that simply mandating testing and documentation prior to a drug or device entering the market does not ensure safety, and that ongoing post-market monitoring is required to ensure that new risks are caught as they emerge.
The FDA uses a variety of tools for post-market monitoring. It benefits from what is sometimes called passive surveillance: the voluntary reporting of safety incidents by doctors, insurers, and pharmaceutical companies themselves. It also engages in active surveillance, which involves reviewing and monitoring electronic health records’ data for reports of adverse reactions or harm that could be linked to the use of a particular drug.7Mary Wiktorowicz, Joel Lexchin, and Kathy Moscou, “Pharmacovigilance in Europe and North America: Divergent Approaches,” Social Science & Medicine 75, no. 1 (July 2012): 165–170, https://doi.org/10.1016/j.socscimed.2011.11.046. Because premarket approval and disclosures are paired with liability, other regulators and patients (including those in US states) can bring tort court cases, which can lead to additional discovery when a product appears to have caused patients harm.8National Academy of Engineering, Product Liability and Innovation: Managing Risk in an Uncertain Environment (Washington, DC: National Academies Press, 1994), https://doi.org/10.17226/4768.
The FDA’s powers are weaker here in comparison to premarket approval, the most powerful stage of regulatory intervention: the period before drugs enter the market is when alignment between regulatory power and companies’ incentives to comply reach their peak. Past the point of market entry, the FDA retains some ability to act in the public interest, through the measures outlined above—but we see a significant drop in the agency’s ability to act and in its track record of doing so successfully.
How could it work for AI?
In both the context of the FDA and in AI, assuring downstream compliance after a product enters the market is a regulatory challenge.
Post-market surveillance for AI is particularly obstacle ridden. But this remains the dominant way we enforce laws on artificial intelligence today.
A better approach would be to more actively monitor AI systems, which are highly dynamic both in the frequency of updates to AI systems and active data flows. But how to do so effectively needs more deliberation: for example, in the pharmaceutical context, regulators use electronic health records to proactively surface patterns indicating harmful effects that could be tied to a particular drug’s use.9This access is provided through surveillance capacities enabled by the owners of these systems, such as health systems. For more, see the Food and Drug Administration (FDA) Sentinel Initiative, https://www.sentinelinitiative.org. In financial regulation, mechanisms like scenario planning are frequently used to anticipate known crisis patterns before they occur. Are there equivalents that could be better leveraged to more proactively surface AI-enabled harms? (See the section on Producing Information and Expertise for more information.)
Such efforts could complement ongoing auditing and impact assessments of AI systems by independent entities; how to effectively track and mitigate harms across the development life cycle of an AI system is thus likewise important, requiring the appropriate calibration of obligations to each phase of development.10See Ian Brown, “Expert Explainer: Allocating Accountability in AI Supply Chains,” Ada Lovelace Institute, June 29, 2023, https://www.adalovelaceinstitute.org/resource/ai-supply-chains; Harry Farmer, “Regulate to Innovate,” Ada Lovelace Institute, November 29, 2021, https://www.adalovelaceinstitute.org/report/regulate-innovate; and Pegah Maham and Sabrina Küspert, “Governing General Purpose AI — A Comprehensive Map of Unreliability, Misuse and Systemic Risks,” interface, July 20, 2023, https://www.stiftung-nv.de/de/publikation/governing-general-purpose-ai-comprehensive-map-unreliability-misuse-and-systemic-risks. When AI systems are deployed out in the world, they are exposed to rapid changes that can alter data and processes in real time. Compared to a drug, which is a fixed object, many AI systems are inherently variable, requiring ongoing monitoring and risk mitigation. Approaches like financial regulation may be useful analogies that offer both the vocabulary and the mechanisms for evaluating risk more systematically.11See, e.g., Financial Stability Oversight Board, “Analytic Framework for Financial Stability Risk Identification, Assessment and Response,” Federal Register 88, no. 218, November 14, 2023, https://home.treasury.gov/system/files/261/Analytic-Framework-for-Financial%20Stability-Risk-Identification-Assessment-and-Response.pdf; U.S. Department of the Treasury, “The Financial Services Sector’s Adoption of Cloud Services,” n.d., accessed July 10, 2024, https://home.treasury.gov/system/files/136/Treasury-Cloud-Report.pdf; and Danny Brando et al., “Implications of Cyber Risk for Financial Stability,” Federal Reserve, May 12, 2022, https://www.federalreserve.gov/econres/notes/feds-notes/implications-of-cyber-risk-for-financial-stability-20220512.html.
Wrangling the potentially varied provenance of AI system components, and establishing a calculus that leads to an informed understanding of the point in a system’s development at which the harm may have been introduced, remains a difficult task. Clearer documentation mandates could help address the muddled provenance of AI system components, but this is an issue that needs deeper deliberation. The “cascading” approach outlined by the UK’s National Cyber Security Centre, in which obligations are attuned to each phase of development (including the disclosure of risks each actor was unable to evaluate), may also offer a useful and similarly concrete complement to FDA-style approaches.12National Cyber Security Centre, “Guidelines for Secure AI System Development,” November 27, 2023, https://www.ncsc.gov.uk/collection/guidelines-secure-ai-system-development.
Finally, a pressing policy question is how to address risk posed by an AI system once identified. The FDA’s power is strengthened by the heft of the penalties it can leverage: its ability to levy huge fines, strip certification, and refuse to approve future products is a powerful deterrent for negligent behavior, particularly since the companies making drugs generally need to bring new products to the agency for approval in the future. There are formal mechanisms for downstream accountability, such as recalling products after the fact, though the FDA’s ability to enact these remedies is weakened once they are in commercial use. Companies also remain liable for harms caused to the public after drugs are made available for wide release.
Currently, the bulk of regulatory enforcement of existing law in AI occurs ex post, and is thus subject to these challenges; even identifying where AI systems are currently in use remains a significant gap. In addition, establishing liability and then demonstrating causation in the AI context are significant barriers.13See Mihailis Diamantis, “Vicarious Liability for AI” Indiana Law Journal 99, no. 1 (Winter 2023): 317–334, . https://papers.ssrn.com/abstract=3850418; and Miriam Buiten, Alexandre de Streel, and Martin Peitz, EU Liability Rules for the Age of Artificial Intelligence, Centre on Regulation in Europe, March 2021, https://doi.org/10.2139/ssrn.3817520.
III. Producing Information and Expertise
What is it?
Pharmaceuticals offer an interesting analogy to AI because, with an eye to history, drugs were similarly cryptic and underscrutinized at the time the FDA was formed. Amy Kapczynski has written extensively about how the FDA played an important role in motivating the production of information that has reduced the opacity of pharmaceuticals, contributing to our knowledge of how drugs work—with benefits that accrue to the entire sector.14Amy Kapczynski, “Dangerous Times.” The FDA thus acts as a key conduit for information to be conveyed to the public, as well as to relevant expert stakeholders who seek to represent the interests of the public (the research community and auditors, patient advocates, other regulators, clinicians).
How could it work for AI?
Many elements of artificial intelligence are currently opaque and underscrutinized, either due to corporate secrecy or because of endemic challenges to interpreting and explaining the outputs of AI systems.15Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (Cambridge, MA: Harvard University Press, 2016), https://www.hup.harvard.edu/books/9780674970847. This presents a significant challenge in regulating artificial intelligence: the absence of key information about how AI systems are constructed and how they function hinders the effectiveness of auditing, benchmarking, and validation.16Ibid. Some of this opacity can be attributed to misaligned incentive structures; left to their own devices, companies are simply not well placed or well motivated enough to consider or prioritize certain questions.17Christopher Morten, “Publicizing Corporate Secrets,” University of Pennsylvania Law Review 171, no. 5 (January 1, 2023): 1319. https://doi.org/10.58112/plr.171-5.2. And other questions may indeed prove intractable empirical barriers to our ability to understand AI systems.18Zachary C. Lipton, “The Mythos of Model Interpretability,” arXiv, March 6, 2017, https://doi.org/10.48550/arXiv.1606.03490.
A number of these information-generating approaches are directly applicable to AI systems. The power to elicit the necessary information to effectively evaluate an AI system, to require AI systems to be labeled as such, and to report adverse events are all within the scope of existing AI governance proposals.19See examples in Appendix 1. These seek to strengthen the baseline offered under existing law: in its Executive Order on AI, the White House included the use of the Defense Production Act to elicit information from AI companies about how they are evaluating the safety of their systems (above a scale threshold) and report this information to the Department of Commerce.20White House, “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” October 30, 2023, https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence. Moreover, the Federal Trade Commission (FTC)’s existing authorities on substantiation in advertising enable the agency to request information from AI companies asking them to validate claims they make publicly about the capabilities of their product offerings.21Federal Trade Commission, “FTC Policy Statement Regarding Advertising Substantiation,” June 24, 2014, https://www.ftc.gov/legal-library/browse/ftc-policy-statement-regarding-advertising-substantiation.
A key distinction is that the “user” of an AI system—the entity procuring AI—is often not the same as the entity on which the system is used.
This distinction matters tremendously. Disclosures and other transparency mechanisms may be important for the decision-making of entities about whether and under what conditions to use AI, but they often will be insufficient to enable those on whom AI is used to remediate harm. Often, individuals are not in a position to make meaningful decisions about how AI impacts their lives. In contexts like hiring, insurance, healthcare, education, and finance, members of the public rarely are given the opportunity to shape how AI may be used in a decision-making process even as it significantly impacts their access to resources and life chances—and they aren’t given the autonomy to opt out. Paying attention to the effects of information and power asymmetries will be particularly important in AI governance; those involved in regulatory design should remain vigilant about how to implement mechanisms that would function more meaningfully in the public’s interest.