...

[2412.17451] Diving into Self-Evolving Training for Multimodal Reasoning


View a PDF of the paper titled Diving into Self-Evolving Training for Multimodal Reasoning, by Wei Liu and 5 other authors

View PDF
HTML (experimental)

Abstract:Self-evolving trainin–where models iteratively learn from their own outputs–has emerged as a key approach for complex reasoning tasks, addressing the scarcity of high-quality chain-of-thought data. However, its effectiveness in multimodal reasoning, a domain more intricate than text-only reasoning, remains underexplored, and the understanding of critical factors in this training paradigm remains limited. Furthermore, a central challenge for this training method is performance saturation, which impedes further improvements and scalability. Inspired by reinforcement learning (RL), in this paper, we reframe self-evolving training for multimodal reasoning through the lens of RL, identifying three pivotal factors: Training Method, Reward Model, and Prompt Variation. Through systematic analysis, we establish relatively optimal design principles that significantly enhance multimodal reasoning capabilities. Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STAR (Multimodal Self-evolving Training for Reasoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. All resources are made publicly available at this https URL.

Submission history

From: Junlong Li [view email]
[v1]
Mon, 23 Dec 2024 10:18:41 UTC (1,079 KB)
[v2]
Tue, 3 Jun 2025 09:07:50 UTC (281 KB)
[v3]
Fri, 6 Jun 2025 10:36:59 UTC (282 KB)

Source link

#Diving #SelfEvolving #Training #Multimodal #Reasoning