A Survey on Red Teaming for Generative Models


View a PDF of the paper titled Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models, by Lizhi Lin and Honglin Mu and Zenan Zhai and Minghan Wang and Yuxia Wang and Renxi Wang and Junjie Gao and Yixuan Zhang and Wanxiang Che and Timothy Baldwin and Xudong Han and Haonan Li

View PDF
HTML (experimental)

Abstract:Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safe use as various vulnerabilities are exposed. In light of this, the field of red teaming is undergoing fast-paced growth, highlighting the need for a comprehensive survey covering the entire pipeline and addressing emerging topics. Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models. Additionally, we have developed the “searcher” framework to unify various automatic red teaming approaches. Moreover, our survey covers novel areas including multimodal attacks and defenses, risks around LLM-based agents, overkill of harmless queries, and the balance between harmlessness and helpfulness.

Submission history

From: Honglin Mu [view email]
[v1]
Sun, 31 Mar 2024 09:50:39 UTC (2,109 KB)
[v2]
Tue, 26 Nov 2024 11:59:17 UTC (3,037 KB)

Source link

#Survey #Red #Teaming #Generative #Models