...

Towards Stable and Efficient Transformer Training via Hybrid Normalization

Towards Stable and Efficient Transformer Training via Hybrid Normalization
[Submitted on 6 Mar 2025 (v1), last revised 8 Dec 2025 (this version, v4)] View a PDF of the paper ...
Read more

[2409.17120] Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Handy Appetizer

Towards Stable and Efficient Transformer Training via Hybrid Normalization
[Submitted on 25 Sep 2024 (v1), last revised 8 Dec 2025 (this version, v2)] Authors:Benji Peng, Xuanhe Pan, Yizhu Wen, ...
Read more

Benchmarking Language Models on Multi-turn Mental Health Support

Towards Stable and Efficient Transformer Training via Hybrid Normalization
[Submitted on 23 Nov 2025 (v1), last revised 5 Dec 2025 (this version, v3)] View a PDF of the paper ...
Read more

The Failure of Instruction Hierarchies in Large Language Models

Towards Stable and Efficient Transformer Training via Hybrid Normalization
[Submitted on 21 Feb 2025 (v1), last revised 4 Dec 2025 (this version, v4)] View a PDF of the paper ...
Read more

AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Towards Stable and Efficient Transformer Training via Hybrid Normalization
arXiv:2512.03794v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have achieved remarkable success in visual question answering tasks, but their reliance ...
Read more

Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation

Towards Stable and Efficient Transformer Training via Hybrid Normalization
[Submitted on 16 May 2025 (v1), last revised 3 Dec 2025 (this version, v3)] View a PDF of the paper ...
Read more

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Towards Stable and Efficient Transformer Training via Hybrid Normalization
arXiv:2512.02973v1 Announce Type: cross Abstract: While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to ...
Read more

Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

Towards Stable and Efficient Transformer Training via Hybrid Normalization
[Submitted on 2 Nov 2023 (v1), last revised 29 Nov 2025 (this version, v4)] View a PDF of the paper ...
Read more

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons

Towards Stable and Efficient Transformer Training via Hybrid Normalization
arXiv:2512.01797v1 Announce Type: cross Abstract: Large language models (LLMs) frequently generate hallucinations — plausible but factually incorrect outputs — undermining ...
Read more

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Towards Stable and Efficient Transformer Training via Hybrid Normalization
arXiv:2511.21750v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must ...
Read more
12337 Next