[2411.07404] Controllable Context Sensitivity and the Knob Behind It

[Submitted on 11 Nov 2024 (v1), last revised 30 May 2025 (this version, v4)]

View a PDF of the paper titled Controllable Context Sensitivity and the Knob Behind It, by Julian Minder and 6 other authors

View PDF
HTML (experimental)

Abstract:When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model’s performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.

Submission history

From: Julian Minder [view email]
[v1]
Mon, 11 Nov 2024 22:22:21 UTC (4,236 KB)
[v2]
Mon, 3 Mar 2025 03:02:55 UTC (10,174 KB)
[v3]
Tue, 27 May 2025 21:44:35 UTC (10,289 KB)
[v4]
Fri, 30 May 2025 15:21:51 UTC (3,380 KB)