News
🚨 NeuralTrust reconocido por Gartner
Iniciar sesiónObtener demo
Volver
How an AI Agent Hacked McKinsey and Exposed 46 Million Messages

How an AI Agent Hacked McKinsey and Exposed 46 Million Messages

Alessandro Pignati 12 de marzo de 2026
Contenido

The recent security incident involving McKinsey & Company's internal AI platform, Lilli, marks a pivotal moment in the evolving landscape of artificial intelligence security. This was not a breach orchestrated by a sophisticated human threat actor over weeks or months. Instead, an autonomous offensive AI agent, developed by the security firm CodeWall, achieved full read and write access to Lilli's production database in a mere two hours.

This incident transcends a typical data breach. It underscores a fundamental shift in cyber warfare, where the speed and autonomy of AI agents are redefining the threat model. McKinsey, a global leader with substantial investments in technology and security, found its internal AI system vulnerable to a classic attack vector, SQL injection, but exploited with unprecedented efficiency by an AI agent. The implications are profound, suggesting that even robust enterprise defenses may struggle against the relentless, machine-speed probing of advanced AI adversaries.

The rapid compromise of Lilli, a platform used by over 40,000 McKinsey employees for critical tasks like document analysis and strategic discussions, serves as a stark reminder. It highlights that the integration of AI into enterprise operations introduces new, complex security challenges that demand a re-evaluation of traditional defense strategies. The era of AI versus AI in cybersecurity is not a distant future scenario; it is demonstrably here, and its pace is accelerating.

The Anatomy of an Autonomous Intrusion

The CodeWall agent's success against McKinsey's Lilli platform was not due to an exotic, never-before-seen vulnerability. Rather, it was a sophisticated exploitation of a common flaw, executed with machine-like precision and speed. The initial breach point was the discovery of publicly exposed API documentation, which, among hundreds of endpoints, revealed 22 that required no authentication. This is a critical oversight in any enterprise system, providing an open door for reconnaissance.

The agent then identified a classic SQL injection vulnerability. This particular flaw resided in how Lilli processed user search queries: while the values were safely parameterized, the JSON keys, the field names, were directly concatenated into SQL queries. When the agent observed these JSON keys reflected verbatim in database error messages, it recognized a SQL injection opportunity that traditional, signature-based security tools often miss. This allowed the agent to perform a series of blind iterations, each one extracting more information about the database structure until live production data began to flow. This methodical, adaptive approach, chaining together seemingly minor issues, demonstrates the power of autonomous agents in discovering and exploiting vulnerabilities that evade conventional defenses.

The Vulnerability of the Prompt Layer

While the exfiltration of 46.5 million chat messages, 728,000 files, and 57,000 user accounts is undeniably severe, the most insidious aspect of the Lilli breach lies in the compromise of its "prompt layer." The system prompts, the foundational instructions that dictate how an AI behaves, its guardrails, and its citation methods, were stored within the same database that the CodeWall agent accessed with write privileges. This meant an attacker could silently rewrite these prompts without any code deployment or system changes, simply by issuing an UPDATE statement through a single HTTP call.

The implications of such a compromise are far-reaching and potentially catastrophic. Imagine a scenario where the AI is subtly instructed to provide "poisoned advice," altering financial models, strategic recommendations, or risk assessments. McKinsey consultants, relying on Lilli as a trusted internal tool, would unknowingly integrate these manipulated outputs into their client-facing work. Furthermore, an attacker could instruct the AI to exfiltrate confidential information by embedding it into seemingly innocuous responses, or even remove safety guardrails, causing the AI to disclose internal data or ignore access controls. This silent persistence, leaving no log trails or file changes, makes prompt layer attacks exceptionally difficult to detect, highlighting prompts as the new "Crown Jewel" assets in the AI era.

Why Traditional Scanners Failed the Test

One of the most striking aspects of the Lilli breach is that the vulnerability exploited, a SQL injection, is far from novel. It is a decades-old security flaw, well-understood and typically detectable by modern security tools. Yet, McKinsey, a firm with significant security investments and a sophisticated technology team, had Lilli running in production for over two years without detecting this critical weakness. This raises a crucial question: why did traditional scanners and internal security audits fail?

The answer lies in the fundamental difference between static, rule-based security assessments and the dynamic, adaptive nature of an autonomous offensive AI agent. Traditional scanners often rely on predefined signatures and checklists, designed to identify known patterns of vulnerabilities. They are excellent at catching common misconfigurations or obvious flaws. However, the CodeWall agent did not follow a checklist. It mapped the attack surface, probed for weaknesses, and, crucially, chained together seemingly minor observations, like JSON keys reflected in error messages, to construct a complex attack path. This ability to adapt, learn, and escalate at machine speed allows AI agents to mimic the creative, persistent tactics of a highly capable human attacker, surpassing the capabilities of conventional security tools.

Securing the Future: Treating Prompts as Crown Jewels

The McKinsey Lilli incident serves as a critical wake-up call for organizations deploying AI systems. The era of simply securing code, servers, and networks is insufficient. We must now extend our security paradigms to encompass the "prompt layer", the instructions that govern AI behavior, and treat them with the same, if not greater, vigilance as other critical assets. This requires a multi-faceted approach to AI security and governance.

Firstly, robust access controls and versioning for prompts are paramount. Just as we track changes to critical codebases, modifications to system prompts must be logged, reviewed, and protected. Secondly, integrity monitoring is essential to detect unauthorized alterations to prompts, ensuring that the AI continues to operate as intended. Thirdly, organizations must embrace continuous, AI-driven red-teaming. Relying solely on human-led penetration testing or traditional scanners is no longer adequate against autonomous AI adversaries. Offensive AI agents can provide a dynamic, real-time assessment of vulnerabilities, identifying complex attack chains that human teams or static tools might miss.

Ultimately, the Lilli breach highlights that AI security is not merely a technical challenge but a strategic imperative. As AI agents become more sophisticated and pervasive, the ability to secure the very instructions that guide them will determine the trustworthiness and resilience of our AI-powered enterprises. The "Crown Jewels" of the AI era are no longer just data; they are the prompts that shape AI's intelligence and behavior.