Democratized LLM Scaling for A Large Model Zoo in the Wild

[Submitted on 7 Oct 2024 (v1), last revised 5 Dec 2024 (this version, v2)]

Authors:Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen

View a PDF of the paper titled Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild, by Xinyu Zhao and 15 other authors

View PDF
HTML (experimental)

Abstract:As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and this http URL methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: this https URL.

Submission history

From: Xinyu Zhao [view email]
[v1]
Mon, 7 Oct 2024 15:55:55 UTC (1,535 KB)
[v2]
Thu, 5 Dec 2024 15:08:56 UTC (1,537 KB)

Source link

#Democratized #LLM #Scaling #Large #Model #Zoo #Wild

Democratized LLM Scaling for A Large Model Zoo in the Wild

Submission history

Recent Posts

“I don’t want to just do Private Division 2.0”: Blake Rochkind on Lyrical Games

Maybank signs RM1bn digital transformation deal with Microsoft

Context Engineering — A Comprehensive Hands-On Tutorial with DSPy

In trial, people lost twice as much weight by ditching ultraprocessed food

Life After the Atomic Blast, as Told by Hiroshima’s Survivors

A glimpse into OpenAI’s largest ambitions

Nvidia rejects US demand for backdoors in AI chips

Nuclear Experts Say Mixing AI and Nuclear Weapons Is Inevitable

ChatGPT Now Issuing Warnings to Users Who Seem Obsessed

Charter Planes and Bidding Wars: How Bitcoin Miners Raced to Beat Trump’s Tariffs