Ten Months After CaMeL, Where Are the Secure AI Agents?

Alessandro Pignati • 12 de febrero de 2026

Contenido

Large Language Models (LLMs) have rapidly transformed how we interact with technology, powering everything from advanced chatbots to sophisticated agentic systems. Yet, with their growing capabilities comes a critical vulnerability: prompt injection. This insidious attack allows malicious actors to manipulate an LLM through carefully crafted inputs, compelling it to deviate from its intended purpose, execute unauthorized actions, or even leak sensitive information. Imagine an AI assistant designed to manage your calendar suddenly being tricked into sending your private meeting notes to an unknown email address. This is the real-world danger of prompt injection.

The industry is grappling with reactive defenses such as heuristic filters, prompt engineering tricks, or costly fine-tuning efforts. While these methods offer some mitigation, they often feel like a game of whack-a-mole, constantly chasing new attack vectors without addressing the root cause. They lack the fundamental robustness required for systems handling sensitive data or critical operations. The question then becomes: can we move beyond these piecemeal solutions to a more foundational, architectural approach to LLM security?

DeepMind’s research introduced CaMeL (CApabilities for MachinE Learning) as a novel framework that promised a significant paradigm shift in this battle. Instead of trying to filter malicious prompts after they were received, CaMeL aimed to defeat prompt injections by design. Drawing inspiration from established software security principles like control flow integrity and capability-based security, it outlined a protective layer around the LLM, designed to preserve system integrity even when handling untrusted data.

The vision was proactive and architectural. It suggested a path toward truly secure and trustworthy agentic systems. Yet, ten months later, convincing real-world implementations remain limited, and the industry still appears to rely largely on reactive defenses rather than embracing the structural changes CaMeL proposed.

The CaMeL Architecture: A Secure Framework for Agentic Systems

At its heart, CaMeL is not a single model but a meticulously designed framework that orchestrates multiple components to achieve robust security. This architecture, often conceptualized as a "quartet," fundamentally redefines how LLMs interact with their environment and handle data. By separating concerns and enforcing strict boundaries, CaMeL ensures that even sophisticated prompt injection attempts are thwarted at an architectural level.

The four core components of the CaMeL framework are:

The Privileged LLM (P-LLM): This is the trusted brain of the operation, responsible for understanding the user's intent and generating a secure plan of action.
The Quarantined LLM (Q-LLM): A specialized LLM designed to safely process potentially untrusted external data without the ability to execute actions.
The Custom Python Interpreter: The enforcement engine that executes the P-LLM's plan, meticulously tracking data flow and applying security policies in real-time.
Security Policies: A set of predefined rules that govern how data can be used and how tools can be invoked, based on the origin and nature of the data.

This quartet works in concert to create an environment where the control flow, what the agent does, is strictly separated from the data flow, what information the agent processes. This separation is paramount. Traditional LLM systems often conflate these two, making them susceptible to prompt injections that can simultaneously hijack the agent's decision-making and manipulate its data handling. CaMeL's architectural design ensures that the agent's actions are always aligned with its intended purpose and security policies, even when confronted with adversarial inputs.

The Privileged LLM (P-LLM): The Trusted Orchestrator

In the CaMeL framework, the Privileged LLM (P-LLM) serves as the trusted orchestrator, the component solely responsible for interpreting the user's high-level intent and translating it into a secure, executable plan. Its role is analogous to a meticulous project manager who, after understanding the core objective, drafts a detailed workflow without getting sidetracked by external, potentially misleading, information.

The P-LLM operates under a critical constraint: it only processes the initial, trusted user query. This isolation is a cornerstone of CaMeL's security model. Unlike conventional LLMs that might process a user's prompt alongside various external data sources (which could be compromised), the P-LLM is shielded from any untrusted inputs. This means that the control flow, the sequence of actions the agent is designed to take, is generated based purely on the user's explicit, verified instructions.

The output of the P-LLM is not a direct action, but rather a pseudo-Python code. This code represents the agent's operational plan, outlining the steps and tools to be used to fulfill the user's request. By generating this control flow in a controlled environment, free from the influence of potentially malicious data, CaMeL effectively prevents control flow hijacking. An attacker cannot inject instructions into the P-LLM's thought process to make the agent execute unintended commands, because the P-LLM never sees the untrusted parts of the input that would facilitate such an attack. This architectural separation ensures the integrity and predictability of the agent's behavior, maintaining alignment with the user's original, trusted intent.

The Quarantined LLM (Q-LLM) and Data Flow Protection: Handling Untrusted Inputs Safely

While the Privileged LLM (P-LLM) safeguards the control flow, CaMeL employs the Quarantined LLM (Q-LLM) to manage the inherent risks associated with untrusted data. In real-world agentic systems, LLMs frequently interact with external tools and receive information from diverse sources such as emails, web pages, documents, many of which cannot be fully trusted. The Q-LLM is specifically designed to process these potentially malicious inputs without compromising the system.

The key to the Q-LLM's security lies in its deliberate limitations. Unlike the P-LLM, the Q-LLM is stripped of any tool-calling capabilities. Its sole purpose is to parse unstructured, untrusted data into a structured format that the system can then safely utilize. For instance, if an agent needs to extract a recipient's email address from an email body, the Q-LLM would perform this extraction. However, it cannot, on its own, initiate an action like sending an email to that extracted address.

This brings us to the crucial concept of Data Flow Protection, which is meticulously enforced by CaMeL's custom Python interpreter. As the interpreter executes the pseudo-Python code generated by the P-LLM, it maintains a sophisticated Data Flow Graph. This graph meticulously tracks the origin and lineage of every piece of data within the system. More importantly, each data element is tagged with capabilities, metadata that defines its source, its level of trust, and the permissible operations that can be performed with it. For example, an email address extracted by the Q-LLM from an untrusted email might be tagged with a capability indicating it's 'untrusted' and cannot be used as a recipient for outgoing communications without explicit policy override. Conversely, an address from a trusted contact list would carry a 'trusted' capability.

When a tool call is made, the custom interpreter rigorously checks the capabilities of all arguments against the defined Security Policies. If a policy dictates that a certain action (e.g., sending an email) requires a 'trusted' recipient address, and the provided address only carries an 'untrusted' capability, the interpreter will block the action. This mechanism effectively prevents malicious data from being used in unintended ways, thereby safeguarding against data exfiltration, unauthorized actions, and other forms of data flow manipulation that are common in prompt injection attacks. By combining the limited Q-LLM with a robust capability-based data flow tracking system, CaMeL ensures that even untrusted inputs are handled within a secure perimeter.

CaMeL in Action: Provable Security and Real-World Implications

The true test of any security framework lies in its practical effectiveness. CaMeL, as detailed in the DeepMind paper, has been rigorously evaluated, particularly on benchmarks like AgentDojo. The results underscore a critical distinction: while an undefended LLM system might achieve a higher raw task completion rate (e.g., 84%), it remains inherently vulnerable to prompt injection attacks. CaMeL, on the other hand, successfully solves 77% of tasks with provable security.

What does "provable security" mean in this context? It signifies a shift from probabilistic defenses, where we hope to catch most attacks, to a more deterministic guarantee. CaMeL's architectural design, with its strict separation of control and data flows, and its capability-based enforcement, provides a strong assurance that specific classes of prompt injection attacks simply cannot succeed. This is a profound difference from relying on heuristic filters or constant model retraining, which are always playing catch-up with new adversarial techniques.

This slight reduction in raw task completion (from 84% to 77%) is a deliberate and acceptable trade-off for enhanced security. It reflects the system's refusal to execute actions that violate its security policies, even if those actions might, in a benign context, contribute to task completion. For instance, if a prompt injection attempts to exfiltrate data by manipulating a tool call, CaMeL's interpreter will block that action, ensuring data integrity at the cost of not completing the malicious sub-task. This prioritization of security over unverified task completion is crucial for deploying LLMs in sensitive applications.

The real-world implications of CaMeL are significant. For enterprises building agentic AI systems that handle confidential information, interact with critical infrastructure, or make autonomous decisions, provable security is not merely a feature, it's a necessity. CaMeL offers a blueprint for developing LLM-powered agents that can operate reliably and securely, even in adversarial environments, thereby fostering greater trust in AI deployments of advanced AI.

Building Trust in AI: NeuralTrust's Vision for Secure LLM Deployment

The emergence of CaMeL marks a pivotal moment in the evolution of Agent security. It underscores a fundamental truth: for AI to truly integrate into critical systems and earn widespread trust, security cannot be an afterthought. It must be woven into the very fabric of its design. The shift from reactive, probabilistic defenses to proactive, architecturally enforced security is not just an academic ideal. It is an operational imperative for any organization deploying LLM-powered agents.

At NeuralTrust, we champion this very philosophy. Our mission is to empower businesses to harness the transformative power of AI with unwavering confidence in its security and reliability. The principles embodied by CaMeL, such as the rigorous separation of concerns, the enforcement of control and data flow integrity, and the use of capability-based security, are precisely the tenets that guide our approach to building trustworthy AI solutions. We recognize that the future of AI hinges on its ability to operate securely, predictably, and transparently, even in the face of sophisticated adversarial attacks.

We believe that adopting a security-by-design mindset, as exemplified by CaMeL, is the only sustainable path forward. It means moving beyond superficial fixes and investing in foundational architectures that inherently resist manipulation. NeuralTrust provides the expertise, tools, and strategic guidance necessary to implement these robust security patterns, helping organizations navigate the complexities of AI Agent deployment while ensuring the highest standards of safety and integrity.

Are you ready to build AI agents that are not just intelligent, but also inherently secure and trustworthy? Explore how NeuralTrust can help you integrate cutting-edge security paradigms, inspired by innovations like CaMeL, into your agentic AI deployments. Partner with us to transform the promise of AI into a secure, reliable reality.