Efficient Alignment by Learning to Correct

2024-12-122024-11-06 by AiNEWS2025

[Submitted on 4 Feb 2024 (v1), last revised 2 Nov 2024 (this version, v5)]

View a PDF of the paper titled Aligner: Environment friendly Alignment by Studying to Appropriate, by Jiaming Ji and eight different authors

View PDF
HTML (experimental)

Summary:With the speedy growth of huge language fashions (LLMs) and ever-evolving sensible necessities, discovering an environment friendly and efficient alignment methodology has by no means been extra important. Nonetheless, the strain between the complexity of present alignment strategies and the necessity for speedy iteration in deployment situations necessitates the event of a model-agnostic alignment strategy that may function underneath these constraints. On this paper, we introduce Aligner, a novel and easy alignment paradigm that learns the correctional residuals between most popular and dispreferred solutions utilizing a small mannequin. Designed as a model-agnostic, plug-and-play module, Aligner could be instantly utilized to varied open-source and API-based fashions with solely one-off coaching, making it appropriate for speedy iteration. Notably, Aligner could be utilized to any highly effective, large-scale upstream fashions. Furthermore, it could possibly even iteratively bootstrap the upstream fashions utilizing corrected responses as artificial human choice information, breaking by way of the mannequin’s efficiency ceiling. Our experiments show efficiency enhancements by deploying the identical Aligner mannequin throughout 11 totally different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Particularly, Aligner-7B has achieved a mean enchancment of 68.9% in helpfulness and 23.8% in harmlessness throughout the examined LLMs whereas additionally successfully decreasing hallucination. Within the Alpaca-Eval leaderboard, stacking Aligner-2B on GPT-4 Turbo improved its LC Win Charge from 55.0% to 58.3%, surpassing GPT-4 Omni’s 57.5% Win Charge (group report).

Submission historical past

From: Jiaming Ji [view email]
[v1]
Solar, 4 Feb 2024 09:24:51 UTC (2,207 KB)
[v2]
Tue, 6 Feb 2024 18:02:01 UTC (2,206 KB)
[v3]
Mon, 3 Jun 2024 14:33:45 UTC (2,808 KB)
[v4]
Mon, 24 Jun 2024 18:55:16 UTC (2,801 KB)
[v5]
Sat, 2 Nov 2024 10:01:38 UTC (2,434 KB)

Source link

#Environment friendly #Alignment #Studying #Appropriate