The Failure of Instruction Hierarchies in Large Language Models

[Submitted on 21 Feb 2025 (v1), last revised 4 Dec 2025 (this version, v4)]

View a PDF of the paper titled Control Illusion: The Failure of Instruction Hierarchies in Large Language Models, by Yilin Geng and 7 other authors

View PDF
HTML (experimental)

Abstract:Large language models (LLMs) are increasingly deployed with hierarchical instruction schemes, where certain instructions (e.g., system-level directives) are expected to take precedence over others (e.g., user messages). Yet, we lack a systematic understanding of how effectively these hierarchical control mechanisms work. We introduce a systematic evaluation framework based on constraint prioritization to assess how well LLMs enforce instruction hierarchies. Our experiments across six state-of-the-art LLMs reveal that models struggle with consistent instruction prioritization, even for simple formatting conflicts. We find that the widely-adopted system/user prompt separation fails to establish a reliable instruction hierarchy, and models exhibit strong inherent biases toward certain constraint types regardless of their priority designation. Interestingly, we also find that societal hierarchy framings (e.g., authority, expertise, consensus) show stronger influence on model behavior than system/user roles, suggesting that pretraining-derived social structures function as latent behavioral priors with potentially greater impact than post-training guardrails.

Submission history

From: Yilin Geng [view email]
[v1]
Fri, 21 Feb 2025 04:51:37 UTC (9,504 KB)
[v2]
Sat, 2 Aug 2025 07:43:49 UTC (9,484 KB)
[v3]
Sun, 24 Aug 2025 06:05:34 UTC (9,491 KB)
[v4]
Thu, 4 Dec 2025 11:13:44 UTC (9,190 KB)

Source link

#Failure #Instruction #Hierarchies #Large #Language #Models

The Failure of Instruction Hierarchies in Large Language Models

Submission history

Recent Posts

The Failure of Instruction Hierarchies in Large Language Models

ECB staffs up for digital euro project

Teaching robot policies without new demonstrations: interview with Jiahui Zhang and Jesse Zhang

Artificial Intelligence Streamlines Higher Ed Admissions

Delivering securely on data and AI strategy

Do Labels Make AI Blind? Self-Supervision Solves the Age-Old Binding Problem

Congress warned that NASA’s current plan for Artemis “cannot work”

Circle’s Jeremy Allaire: We’re Building an ‘Economic OS’ for the AI Era

AI chatbots can sway voters better than political advertisements