Grok-4 Jailbreak with Echo Chamber and Crescendo

Ahmad Alobaid • July 11, 2025

Contents

LLM jailbreak attacks are not only evolving individually, they can also be combined to amplify their effectiveness. In this post, we present a concrete example of such a combination.

A few weeks ago, we introduced the Echo Chamber Attack, which manipulates an LLM into echoing a subtly crafted, poisonous context, allowing it to bypass its own safety mechanisms. We successfully tested Echo Chamber across multiple LLMs.

In this blog post, we take that a step further by combining Echo Chamber with the Crescendo attack. We demonstrate how this combination strengthens the overall attack strategy and apply it to Grok-4 to showcase its enhanced effectiveness.

Example

We combined Echo Chamber and Crescendo to jailbreak the LLM. The target objective was to prompt the model to reveal instructions for making a Molotov cocktail, an example originally used in the Crescendo paper.

We began by running Echo Chamber with both poisonous seeds and steering seeds. In the initial attempt, the steering seeds were too strong, triggering the model’s safeguards and causing it to flag the interaction as malicious. In the next trial, we used milder steering seeds and followed the full Echo Chamber workflow: introducing a poisoned context, selecting a conversational path, and initiating the persuasion cycle.

While the persuasion cycle nudged the model toward the harmful goal, it wasn’t sufficient on its own. At this point, Crescendo provided the necessary boost. With just two additional turns, the combined approach succeeded in eliciting the target response.

Encouraged by this result, we proceeded to test additional examples to evaluate whether this method generalizes across other harmful objectives.

Figure 1. Example of the objective being reached on Grok 3 and showing the step by step instructions on how to make a Molotov Cocktail. We blurred it for security reasons.

Figure 2. Successful instance of Grok-4 producing harmful output related to the Molotov objective during a combined Echo Chamber and Crescendo attack.

Integrating Echo Chamber and Crescendo

As demonstrated earlier in the Echo Chamber blog post, Echo Chamber can be easily combined with other techniques. Figure 2 presents a simplified workflow illustrating the interaction between Echo Chamber and Crescendo.

The attack begins with Echo Chamber, which includes an additional check in the persuasion cycle to detect "stale" progress, situations where the conversation is no longer moving meaningfully toward the objective. When this occurs, Crescendo steps in to provide an extra push toward the target. This additional nudge typically succeeds within two iterations. At that point, the model either detects the malicious intent and refuses to respond, or the attack succeeds and the model produces a harmful output.

Figure 3. Workflow illustrating the integration of Echo Chamber and Crescendo to enhance the effectiveness of the attack.

Experiments

To further evaluate the effectiveness of our approach, we manually selected several malicious objectives from the Crescendo paper, focusing specifically on prompts related to illegal activities. We then tested these objectives on Grok-4 using a combination of Echo Chamber and Crescendo.

Our results show that the attack was successful in a significant portion of attempts. Specifically, we achieved a 67% success rate for the Molotov objective, 50% for the Meth objective, and 30% for Toxin. Notably, in one instance, the model reached the malicious objective in a single turn, without requiring the Crescendo step.

Topic	Success rate (%)	Theme of the successful cases	Techniques
Molotov	67%	Manual, Description	Echo Chamber + Crescendo
Meth	50%	Story	Echo Chamber + Crescendo
Toxin	30%	Law, Description	Echo Chamber + Crescendo

Conclusion

We demonstrated the effectiveness of combining Echo Chamber and Crescendo to enhance the success of adversarial prompting. By applying this method to Grok-4, we were able to jailbreak the model and achieve harmful objectives without issuing a single explicitly malicious prompt. This highlights a critical vulnerability: attacks can bypass intent or keyword-based filtering by exploiting the broader conversational context rather than relying on overtly harmful input. Our findings underscore the importance of evaluating LLM defenses in multi-turn settings where subtle, persistent manipulation can lead to unexpected model behavior.