GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Abstract
LLMs have demonstrated remarkable capabilities but remain highly susceptible to adversarial prompts despite extensive efforts for safety alignment, raising serious security concerns for their real-world adoptions. Existing jailbreak attacks rely on manual heuristics or computationally expensive optimization techniques, both struggling with generalization and efficiency.
In this paper, we introduce GASP, a novel black-box attack framework that leverages latent Bayesian optimization to generate human-readable adversarial suffixes. Unlike prior methods, GASP efficiently explores continuous embedding spaces, optimizing for strong adversarial suffixes while preserving prompt coherence.
We evaluate our method across multiple LLMs, showing its ability to produce natural and effective jailbreak prompts. Compared with alternatives, GASP significantly improves attack success rates and reduces computation costs, offering a scalable approach for red-teaming LLMs.
Examples
This demo illustrates how adversarial suffixes in diverse examples, generated from GASP, can bypass safety measures in language models. Select a potentially harmful prompt and toggle the suffix to see the difference in responses.
Note: This demo is for educational purposes only.
Resources
Citation
@inproceedings{basani2025gasp, title={{GASP}: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking {LLM}s}, author={Advik Raj Basani and Xiao Zhang}, booktitle={ICLR 2025 Workshop on Building Trust in Language Models and Applications}, year={2025}, url={https://openreview.net/forum?id=Gonca78Bwq} }
Ethical Statement
Our research is driven by the commitment to advancing the understanding of LLM vulnerabilities. While GASP enables the efficient generation of coherent adversarial suffixes, it is worth noting that manual methods for jailbreaking LLMs have already been widely accessible.
Our research seeks to formalize and characterize these vulnerabilities rather than introduce novel threats. In alignment with responsible disclosure practices, we have shared our findings with relevant organizations whose models were tested in this study and transparently disclosed all of our findings.