A Survey on Red Teaming for Generative Models

[Submitted on 31 Mar 2024 (v1), last revised 26 Nov 2024 (this version, v2)]

Authors:Lizhi Lin, Honglin Mu, Zenan Zhai, Minghan Wang, Yuxia Wang, Renxi Wang, Junjie Gao, Yixuan Zhang, Wanxiang Che, Timothy Baldwin, Xudong Han, Haonan Li

View a PDF of the paper titled In opposition to The Achilles’ Heel: A Survey on Crimson Teaming for Generative Fashions, by Lizhi Lin and Honglin Mu and Zenan Zhai and Minghan Wang and Yuxia Wang and Renxi Wang and Junjie Gao and Yixuan Zhang and Wanxiang Che and Timothy Baldwin and Xudong Han and Haonan Li

View PDF
HTML (experimental)

Summary:Generative fashions are quickly gaining recognition and being built-in into on a regular basis purposes, elevating considerations over their protected use as varied vulnerabilities are uncovered. In gentle of this, the sector of crimson teaming is present process fast-paced progress, highlighting the necessity for a complete survey overlaying your complete pipeline and addressing rising subjects. Our in depth survey, which examines over 120 papers, introduces a taxonomy of fine-grained assault methods grounded within the inherent capabilities of language fashions. Moreover, we now have developed the “searcher” framework to unify varied automated crimson teaming approaches. Furthermore, our survey covers novel areas together with multimodal assaults and defenses, dangers round LLM-based brokers, overkill of innocent queries, and the stability between harmlessness and helpfulness.