View a PDF of the paper titled Does Unlearning Actually Unlearn? A Black Field Analysis of LLM Unlearning Strategies, by Jai Doshi and Asa Cooper Stickland
Summary:Massive language mannequin unlearning goals to take away dangerous info that LLMs have learnt to stop their use for malicious functions. LLMU and RMU have been proposed as two strategies for LLM unlearning, reaching spectacular outcomes on unlearning benchmarks. We research intimately the efficacy of those strategies by evaluating their affect on normal mannequin capabilities on the WMDP benchmark in addition to a biology benchmark we create. Our experiments present that RMU typically results in higher preservation of mannequin capabilities, for related or higher unlearning. We additional check the robustness of those strategies and discover that doing 5-shot prompting or rephrasing the query in easy methods can result in an over ten-fold enhance in accuracy on unlearning benchmarks. Lastly, we present that coaching on unrelated information can virtually fully get better pre-unlearning efficiency, demonstrating that these strategies fail at actually unlearning. The code is out there at: this https URL.
Submission historical past
From: Jai Doshi [view email]
[v1]
Mon, 18 Nov 2024 22:31:17 UTC (775 KB)
[v2]
Wed, 20 Nov 2024 02:23:11 UTC (775 KB)
Source link
#Unlearning #Unlearn #Black #Field #Analysis #LLM #Unlearning #Strategies