Docker recently launched Gordon, an AI-powered assistant designed to be the ultimate companion for container orchestration. According to the official Docker documentation, Gordon is built to "take action on your Docker workflows," specifically focusing on explaining concepts, writing Dockerfiles, and debugging container failures. It is presented as a specialized, intent-driven tool meant to live within the high-stakes environment of infrastructure management.
However, there is a significant disconnect between the marketing and the reality of the tool's current beta state. When you interact with Gordon, you quickly realize that it is a general-purpose encyclopedia that hasn't been told to stay in its lane. For instance, instead of helping a developer optimize a multi-stage build, Gordon is perfectly capable of providing a detailed historical account of the 1966 Palomares nuclear incident, including the role of "Paco el de la bomba," the Spanish fisherman who helped recover a lost hydrogen bomb.

This "identity crisis" is the first red flag for any security professional. When a tool embedded in your primary development environment, with the potential power to manage images, volumes, and networks, is also capable of acting as a Cold War historian, it signals a fundamental lack of domain grounding. For an enterprise, this isn't just a quirky behavior. It is evidence that the agent lacks the architectural constraints necessary to distinguish between a legitimate technical request and a distracting, out-of-scope query.
From Little Red Riding Hood to McDonald's
In the field of AI security, we refer to this phenomenon as a "capability leak." It occurs when an AI system, despite being branded for a specific business function, fails to suppress the vast, unconstrained knowledge of its underlying large language model. We see this clearly when Gordon, a tool supposedly dedicated to containerization, is asked to recite the story of "Little Red Riding Hood." Not only does it comply, but it does so with the narrative flair of a general-purpose chatbot, completely abandoning its technical persona.

This is the same fundamental vulnerability that recently plagued the McDonald's support chatbot. What was intended to be a simple interface for order support was quickly "jailbroken" by users who coaxed it into writing complex code and engaging in philosophical debates. Similar "off-the-rails" incidents at Chipotle and Alcampo have shown that when an agent "breaks character," it isn't just a harmless diversion. It is a sign that the agent is essentially a general-purpose engine wearing a thin, branded mask.
When an agent lacks strict domain restriction, it becomes a liability for brand integrity and operational focus. If a chatbot designed for a Big Mac order starts acting as a coding assistant, or a Docker assistant starts telling fairy tales, the trust model of the enterprise software is broken. These leaks prove that many current AI deployments lack the deep, architectural constraints necessary to keep them focused on their specific business objectives, leaving them vulnerable to being steered far away from their intended purpose.
Pizza Recipes and Python Functions
The danger of an unrestricted agent becomes most apparent when we look at how it handles seemingly harmless requests. In our testing, Gordon was more than happy to provide detailed pizza recipes and write general-purpose Python functions that had nothing to do with Docker. While these might seem like quirky "Easter eggs," they represent a significant technical vulnerability: a massive expansion of the agent's attack surface.

Every "innocent" capability an agent possesses is a potential tool for an attacker. If Gordon is allowed to act as a general-purpose Python interpreter or a storyteller, it provides a much wider range of contexts that can be used to bypass its core security instructions. An attacker doesn't need to ask Gordon to "delete a container" directly. They can hide their malicious intent within a complex request for a Python-based pizza calculator or a historical narrative, slowly steering the agent toward unauthorized actions.

In a truly agentic system, where the AI has the power to interact with your local environment, this lack of restriction is a liability. By failing to limit Gordon's knowledge to the Docker domain, the system becomes significantly harder to defend. The more "roles" an agent can play, the more ways there are to trick it into breaking its primary security rules. In the world of infrastructure security, a tool that can do "anything" is a tool that can be manipulated to do "everything.
Architectural Guardrails
Securing an agentic system requires a shift in mindset: we must stop treating agents as "chatbots that can do things" and start treating them as "software components with probabilistic interfaces." As the examples of Gordon and McDonald's demonstrate, a simple system prompt, telling the AI "you are a Docker expert", is easily bypassed. To build truly secure agents, organizations must implement a multi-layered defense strategy that enforces intent at the architectural level.
The most effective defense is Intent Classification. Before a user's prompt ever reaches the primary LLM, it should be intercepted by a smaller, highly specialized "gatekeeper" model. This model's only job is to determine if the request falls within the agent's allowed domain. If a user asks a Docker assistant for a pizza recipe or a story about Little Red Riding Hood, the gatekeeper should reject the request before it can even trigger the more powerful (and potentially dangerous) capabilities of the main model.
To clarify the difference between a "naive" deployment and a secure one, consider the following comparison:
| Feature | Unrestricted Agent (e.g., Gordon Beta) | Secure Agent (Best Practice) |
|---|---|---|
| Domain Grounding | Weak; relies on a system prompt. | Strong; enforced by intent classifiers. |
| Capability Scope | General-purpose; can discuss any topic. | Restricted; limited to specific business tasks. |
| Tool Access | Broad; can write/execute arbitrary code. | Hardened; access limited to essential APIs. |
| Risk Profile | High; vulnerable to Shadow AI and injection. | Low; minimized attack surface. |
| Human Oversight | Often optional or session-based. | Mandatory for sensitive/destructive actions. |
Beyond intent, developers must practice Capability Hardening. This means stripping away any functionality that isn't strictly necessary for the task at hand. If an agent is meant to manage Dockerfiles, it shouldn't have the ability access the open web for non-technical data. Finally, for any action that could impact production infrastructure, a Human-in-the-loop (HITL) requirement is non-negotiable. A secure agent proposes; a human disposes.
Takeway
The industry is currently in a "honeymoon phase" with AI agents, where the novelty of a chatbot that can do everything often overshadows the necessity of a tool that does one thing securely. As Gordon demonstrates, even the most respected names in software can fall into the trap of deploying unconstrained models that prioritize conversational versatility over operational security. However, as AI becomes more deeply integrated into our core development environments, the cost of these "capability leaks" will only rise.
We must move away from the era of "chatbots with skins" and toward truly intent-aware systems. A secure agent is not one that can answer every question. It is one that knows exactly what it is supposed to do and, more importantly, what it is not allowed to do. For companies building in this space, the goal should be to create agents that are as reliable and predictable as the code they help manage.
The future of agentic security lies in precision. By enforcing strict domain boundaries and implementing multi-layered guardrails, we can transform AI from an unpredictable conversationalist into a powerful, trusted partner. After all, if you wanted a pizza recipe or a bedtime story, you would have asked a chef or a storyteller, not your container management tool. Is your organization ready to demand that its AI agents finally stay in their lane?




