Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning

2024-12-112024-11-19 by AiNEWS2025

[Submitted on 9 Oct 2024 (v1), last revised 18 Nov 2024 (this version, v2)]

View a PDF of the paper titled Make the most of the Move earlier than Moving into the Identical River Twice: Certainty Represented Data Move for Refusal-Conscious Instruction Tuning, by Runchuan Zhu and 6 different authors

View PDF
HTML (experimental)

Summary:Refusal-Conscious Instruction Tuning (RAIT) permits Giant Language Fashions (LLMs) to refuse to reply unknown questions. By modifying responses of unknown questions within the coaching information to refusal responses equivalent to “I do not know”, RAIT enhances the reliability of LLMs and reduces their hallucination. Usually, RAIT modifies coaching samples primarily based on the correctness of the preliminary LLM’s response. Nevertheless, this crude method may cause LLMs to excessively refuse answering questions they may have accurately answered, the issue we name over-refusal. On this paper, we discover two main causes of over-refusal: Static battle happens when related samples inside the LLM’s function area obtain differing supervision indicators (unique vs. modified “I do not know”). Dynamic battle, alternatively, emerges because the LLM’s information evolves throughout SFT, permitting it to reply questions that have been beforehand unanswerable. But, these now-answerable coaching samples nonetheless retain the unique “I do not know” supervision indicators primarily based on the preliminary LLM state, leading to inconsistencies. These conflicts trigger the educated LLM to misclassify recognized questions as unknown, leading to over-refusal. To handle this difficulty, we introduce Certainty Represented Data Move for Refusal-Conscious Directions Tuning (CRaFT). CRaFT facilities on two essential contributions: First, we moreover incorporate response certainty to selectively filter and modify information, lowering static conflicts. Second, we implement preliminary rehearsal coaching to characterize adjustments within the LLM’s information state, which helps mitigate dynamic conflicts in the course of the fine-tuning course of. We carried out in depth experiments on open-ended query answering and multiple-choice query process. Experiment outcomes present that CRaFT can enhance LLM’s total efficiency in the course of the RAIT course of. Supply code and coaching information will likely be launched at Github.

Submission historical past

From: Jiang Wu [view email]
[v1]
Wed, 9 Oct 2024 14:12:51 UTC (2,639 KB)
[v2]
Mon, 18 Nov 2024 13:15:41 UTC (2,640 KB)

Source link

#Certainty #Represented #Data #Move #RefusalAware #Instruction #Tuning