The Loop Is Getting Very Fast
What Claude's safety architecture can and can’t do in a military deployment
Anthropic’s Claude was reportedly used during a live U.S. military operation in January 2026. Axios reported that the use came through the Anthropic-Palantir partnership but stated that they could not confirm the precise role Claude played. Most coverage has focused on the political standoff that followed. This piece looks at the engineering. Anthropic’s safety architecture as published and as deployed may not be the same thing, and if it is, you have to ask what it’s actually doing there.
This analysis draws on publicly available resources. Classified deployments are opaque by design. This piece evaluates the public architecture and asks what would have to be true for it to matter in a deployment.
Palantir’s Platform
The contract most relevant here is Maven. Project Maven was established in 2017 to deliver AI capabilities to the U.S. military. Palantir has been the primary contractor since 2019, and the Maven Smart System (MSS) is now the core AI-powered targeting and battlefield awareness platform across the DoD. In May 2024, the relationship was formalized through a $480 million IDIQ contract, later raised to nearly $1.3 billion.
Palantir’s Artificial Intelligence Platform (AIP) connects commercial, frontier AI models to military systems. Axios reported that Claude is “currently the only model available on those classified platforms” used for the military’s most sensitive work, with a separate source describing Claude as “ahead of the others in a number of applications relevant to the military, such as offensive cyber capabilities.” Palantir’s Supported LLMs page shows Claude 3.7 Sonnet (via AWS Bedrock) cleared at IL5 — the highest impact level listed.1 2
The Pentagon has credited Maven with providing targeting support for U.S. airstrikes in Iraq, Syria, and Yemen in early 2024, locating hostile maritime assets in the Red Sea, and supplying locations of Russian equipment to Ukrainian forces in 2022. The military insists humans remain in the loop. CENTCOM’s CTO was emphatic in 2024 that “every step that involves AI has a human checking in at the end.” But the stated ambition is speed and scale. At the GEOINT Symposium in May 2025, a Maven program official described the goal: units making a thousand targeting decisions in one hour, with timelines compressed from hours to minutes. Human in the loop, but the loop is getting very fast.
Palantir’s stack is large, and the Maven contract predates major waves of generative AI. Documented agentic support does not imply it is being used in any specific instance. Nonetheless, Palantir’s AIP also supports distinctly agentic capabilities. In January 2026, they launched a blog series on their “Agentic Runtime”, described as “the first in a series exploring Palantir AIP’s Agentic Runtime, the integrated toolchain for building, deploying, and managing agents in mission-critical settings.”3
Claude’s Safety Architecture
Anthropic’s publicly documented safety architecture can generally be broken into two categories: training-time measures through Constitutional AI (CAI) and inference-time, application-layer filtering. CAI is Anthropic’s training method that shapes Claude’s behavioral tendencies before deployment.4 A primary example of Anthropic’s application-layer filtering would be their constitutional classifiers, separate models that filter Claude’s inputs and outputs at inference time. Both are governed by documents Anthropic calls “constitutions,” but these are different documents with different scopes.
The CAI constitution states in its preface: “This constitution is written for our mainline, general-access Claude models. We have some models built for specialized uses that don’t fully fit this constitution.” The Pentagon’s Claude may not use the same constitution. Nonetheless, the public record is the only safety architecture anyone outside can evaluate.
The CAI constitution defines Claude’s values, decision-making priorities, a three-tier principal hierarchy (Anthropic, then operators, then users), and a set of hardcoded prohibitions the model should never violate regardless of instructions. These prohibitions include CBRN weapons assistance, CSAM generation, attacks on critical infrastructure, and facilitating genocide or crimes against humanity.
Individual tasks in a targeting workflow can be indistinguishable from civilian work at the computational level. Checking whether a coordinate falls within a boundary is the same operation whether it’s for a delivery zone or an area of engagement. Neither the math nor the prompt needs to reference its downstream purpose. None of it maps to any hardcoded prohibition.
So there are then two distinct ways safety training can fail here. The first is that Claude never recognizes what it’s participating in. The second is that it does, but is effectively overridden. On the first: whether safety training activates depends on how much of the workflow Claude sees at once. A prompt like “evaluate these contacts against the target list and recommend strike priorities given current force position” may trigger trained safety behaviors. But the infrastructure around Claude controls task granularity, not the model. Break that into routine subtasks and each one passes below the threshold where safety behaviors activate. This may be by design on both sides — task decomposition is standard engineering practice, and Anthropic may be more comfortable with Claude’s ability to handle narrow subtasks.
CAI training also has structural limitations independent of task decomposition. Trained tendencies are not deterministic. No current neural network architecture enforces hard constraints at runtime.5 Even given an explicitly military prompt, Claude may still comply.
Bypassing a refusal may require nothing more than prompt engineering that is already standard practice in generative LLM integration. Anthropic’s own testing has shown that baseline safety training is brittle under adversarial pressure, even in the categories where refusal behaviors should be strongest. Constitutional classifiers were developed precisely because trained refusals alone were not reliable enough.
The second category, application-layer filtering, is the harder constraint but narrower in scope. For example, Anthropic’s constitutional classifiers have their own constitution, separate from the CAI constitution, scoped to Chemical, Biological, Radiological, and Nuclear (CBRN) threats. They work because CBRN content is often classifiable on its face. To what extent Anthropic’s constitutional classifiers or other content-monitoring techniques are deployed on the Pentagon’s Claude is unknown.6
After Claude
It is unknown what role Claude played in the January 2026 military operation, whether it operated within the Maven Smart System or elsewhere, or whether Palantir’s agentic runtime capabilities were involved. What is known is that the infrastructure to connect an LLM to military operational workflows exists and is documented, and that Claude was used during a live operation.
As of February 27th, the Pentagon has designated Anthropic a supply chain risk and ordered a six-month phaseout of its products across federal agencies. That decision was driven by Anthropic’s refusal to permit unrestricted use on classified networks, specifically around autonomous weapons and mass domestic surveillance.7 The architectural questions this piece raises are not resolved by that decision. They are inherited by whatever model replaces Claude and by every future deployment where a model’s safety architecture governs what it says but not what its outputs are used for.
Author’s Note
I’m a machine learning engineer. I spent over five years at Booz Allen Hamilton building production ML systems on DoD contracts, though not on Maven or anything involving Palantir. I have no affiliation with Anthropic or Palantir.
AI safety is good at theory and thin on praxis. My hope is that this piece helps make some of the engineering more accessible and grounds the conversation in how these systems actually get deployed. If you see meaningful gaps in my analysis, please reach out!
Although sources say the military is using Anthropic’s Claude, Palantir’s AIP is explicitly model-agnostic and supports models from OpenAI, Google, Meta, xAI, and others. The infrastructure to swap models exists. The alternatives aren’t starting from zero. Elon Musk’s xAI signed a deal to bring Grok into classified settings. Hours after Anthropic was designated a supply chain risk, OpenAI announced its own deal for the Pentagon’s classified network.
L5 is a DoD cloud security tier for mission-critical unclassified information, not classified data (which is handled at IL6 and above). It is the highest impact level on Palantir's public documentation, which may not reflect the full range of environments in use.
Palantir’s AIP documentation is, as far as I can tell, the only public documentation for an AI agent platform actually deployed in mission-critical government settings. Ken Huang, who leads OWASP’s AI Vulnerability Scoring System project, has published a technical analysis based on it.
CAI uses a two-phase training process where the model critiques and revises its own outputs against a set of principles, then applies reinforcement learning using AI-generated evaluations (RLAIF). For a technical but high-level overview of RLHF and its variants, see rlhfbook.com.
Research from Princeton and Google DeepMind has shown that safety alignment in current LLMs can be “shallow,” primarily modifying the model’s first few output tokens rather than deeply suppressing harmful content throughout generation. Anthropic’s Scaling Monosemanticity work aims to address this by identifying and steering safety-relevant feature activations, though it remains a research direction, not a deployed capability.
Anthropic deploys additional classifiers for other policy violations, including child safety. All of these, constitutional classifiers included, are content-classification approaches. They identify harmful content by analyzing the content itself.
Anthropic may well have imposed meaningful constraints through its partnerships with the Pentagon, including its integration with Palantir on classified networks and a direct $200 million DoD contract. The recent Pentagon dispute made Anthropic’s two red lines explicit. But whether those are encoded in specific contract terms or simply enforced through Anthropic’s general usage policies remains undisclosed. Furthermore, some of Dario Amodei’s public statements reference “safeguards” rather than contract terms, which may indicate technical restrictions beyond what has been published. But if such constraints exist in the Pentagon’s deployment, they are undisclosed.
