Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

[ad_1]

[Submitted on 6 Jul 2025 (v1), last revised 3 Sep 2025 (this version, v2)]

View a PDF of the paper titled RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling, by Xiuying Wei and 3 other authors

View PDF
HTML (experimental)

Abstract:Transformers have become the cornerstone of modern large-scale language models, but their reliance on softmax attention poses a computational bottleneck at both training and inference. Recurrent models offer high efficiency, but compressing the full sequence into a fixed-size and holistic representation suffers from memory degradation in long contexts and limits fine-grained retrieval. To address this, we propose RAT, an intermediate design that bridges the efficiency of RNNs and capacity of attention. RAT partitions the input into chunks, applies recurrence within each chunk for local dependencies, and softmax-based attention across chunks for long-range interactions. This design mitigates memory degradation and enables direct access to distant tokens, while retaining computational efficiency. Empirically, with a chunk size of 16, the RAT block achieves a 7x improvement in training speed with 100K token sequences and 9x in generation at the 4K position, while maintaining similar performance compared to standard attention. We demonstrate this by training 1.3B parameter models from scratch and performing large-scale evaluations, including short- and long-context benchmarks, as well as supervised fine-tuning~(SFT). We further propose a hybrid architecture that interleaves RAT with local attention. By combining efficient long-range modeling with strong local interactions, this hybrid design not only improves inference speed and reduces cache memory usage, but also consistently enhances performance and shows the overall best results. Code is available at this https URL.

Submission history

From: Xiuying Wei [view email]
[v1]
Sun, 6 Jul 2025 15:08:49 UTC (1,049 KB)
[v2]
Wed, 3 Sep 2025 14:28:23 UTC (1,082 KB)

Source link

#Bridging #RNN #Efficiency #Attention #Accuracy #Chunkbased #Sequence #Modeling

[ad_2]

Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

Submission history

Recent Posts

New Google Cloud tool fights future quantum attacks

Western Union to launch stablecoin

“We will never build a sex robot,” says Mustafa Suleyman

Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood)

Mazda shows a rotary hybrid concept for Tokyo with evolved design language

Donald Trump’s Truth Social Is Launching a Polymarket Competitor

Roundtables: Seeking Climate Solutions in Turbulent Times

Withings’ urine scanning health tracker is now available for $350

Google Workspace Promo Code: Up to 14% Off in October 2025

University Denies Monkeys That Escaped in Truck Crash Were Infected With Horrific Diseases