...

A Survey on Red Teaming for Generative Models


View a PDF of the paper titled In opposition to The Achilles’ Heel: A Survey on Crimson Teaming for Generative Fashions, by Lizhi Lin and Honglin Mu and Zenan Zhai and Minghan Wang and Yuxia Wang and Renxi Wang and Junjie Gao and Yixuan Zhang and Wanxiang Che and Timothy Baldwin and Xudong Han and Haonan Li

View PDF
HTML (experimental)

Summary:Generative fashions are quickly gaining recognition and being built-in into on a regular basis purposes, elevating considerations over their protected use as varied vulnerabilities are uncovered. In gentle of this, the sector of crimson teaming is present process fast-paced progress, highlighting the necessity for a complete survey overlaying your complete pipeline and addressing rising subjects. Our in depth survey, which examines over 120 papers, introduces a taxonomy of fine-grained assault methods grounded within the inherent capabilities of language fashions. Moreover, we now have developed the “searcher” framework to unify varied automated crimson teaming approaches. Furthermore, our survey covers novel areas together with multimodal assaults and defenses, dangers round LLM-based brokers, overkill of innocent queries, and the stability between harmlessness and helpfulness.

Submission historical past

From: Honglin Mu [view email]
[v1]
Solar, 31 Mar 2024 09:50:39 UTC (2,109 KB)
[v2]
Tue, 26 Nov 2024 11:59:17 UTC (3,037 KB)

Source link

#Survey #Crimson #Teaming #Generative #Fashions