Experts worry about transparency, unforeseen risks as DOD forges ahead with new frontier AI projects

In a separate discussion with DefenseScoop about the CDAO’s recent foundation model awards, AI safety engineer Dr. Heidy Khlaaf pointed out that T&E and related risk assessments typically take significantly longer time than the timescales observed for the four contracts.

Khlaaf currently serves as the chief AI scientist at the AI Now Institute, where she concentrates on the assessment and safety of AI within autonomous weapon systems.

“The DOD recently cutting the size of the Office of the Director of Operational Test and Evaluation in half speaks for itself. In a lot of ways, there is signalling for much faster AI adoption without the rigorous processes that have existed since the 1980s to ensure new technologies are safe or effective,” she said.

Pointing to publicly available information regarding the four commercial models and latest evaluation results, Khlaaf argued that they would likely not meet the standard defense thresholds expected for systems to be used in critical military-supporting settings.

“We’ve particularly warned before that commercial models pose a much more significant safety and security threat than military purpose-built models, and instead this announcement has disregarded these known risks and boasts about commercial use as an accelerator for AI, which is indicative of how these systems have clearly not been appropriately assessed,” Khlaaf explained. 

There are certain contracts, such as experimental use cases and research and development projects, that might not require T&E or risk assessments. However, Khlaaf noted, such checks would be exceedingly necessary in the CDAO’s current frontier AI efforts — as the announcement explicitly calls out the use of “AI capabilities to address critical national security challenges.”

“An independent assessment to substantiate these companies’ claims has always been an existing core requirement of military and safety-critical systems, and it guarantees that no aspect of the system’s pipeline is compromised, while ensuring a system’s security and fitness for use,” she said.

Existing, relevant risks that accompany discarding T&E practices, Khlaaf added, were already evident in a recent viral incident where Elon Musk-owned xAI’s model — Grok — praised Adolf Hitler, referred to itself as MechaHitler, and generated other antisemitic content.

“This was due to an updated system prompt by the Grok team itself to nudge it towards a specific view. It dispels the myth that frontier AI is somehow objective or in control of its learning. A model can always be nudged and tampered by AI companies and even adversaries to output a specific view, which gives them far too much control over our military systems. And this is just one security issue out of dozens that have been unveiled over the last several years that have yet to be addressed,” Khlaaf told DefenseScoop.