Taming Knowledge Conflict in Language Models (ICML 2025 Spotlight)

1University of Illinois Urbana Champaign, 2VISA Research

Abstract

Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge. Previous works attribute this conflict to the interplay between "memory heads" and "context heads", attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the superposition of contextual information and parametric memory (CP superposition), where highly influential attention heads simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JUICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JUICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JUICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and CP superposition in attention heads, which further elucidates the effectiveness of JUICE in these settings.

How can we study knowledge conflict in a clean, controlled, and rigorous setting?

  • Unified view of knowledge conflict
    • (1) Irrelevant Context - Context is misleading while parametric memory is correct; an LLM should trust its own knowledge.
    • (2) RAG Hallucination - Context is reliable while parametric memory is outdated or stubborn; an LLM should defer to the context.
    Depending on the application, users may want the model to either stay faithful to its parametric memory or prioritise contextual information.
  • Fine-grained Irrelevant Context benchmark — ParaConflict
    • We construct three evaluation tiers: (a) clean input, (b) substitution conflict, and (c) coherent conflict, representing increasing levels of contextual distraction over diverse factual domains.
    • This design lets us trace model internals across conflict levels and test whether an intervention can steer the model consistently in every case.

Empirical Observation of the CP Superposition

Observation 1: Inconsistent Behaviors of Model Components Under Different Degrees of Knowledge Conflict.

Observation 2: Counteracting Effects of Multiple Individually Effective Interventions. These observations collectively suggest the CP superposition phenomenon.

Our Method: Just Run Twice (JUICE)

JUICE operates in two stages: (1) a head identification stage, where two sets of attention heads that yield consistent improvements with positive or negative scaling are identified using a minimal number of samples, and (2) an dual-run inference stage, where the model runs twice: first saving the outputs of the identified heads, and then using scaled versions of these saved outputs to intervene during the second run. Intuitively, this approach ensures that the identified components are consistently effective and the dual-run strategy further mitigates the superposition effects, therefore providing more accurate steering directions through residual head activations.

(Representative) Experiment Results

(Representative) experiment results of the intervention for enhancing parametric memory on the ParaConflict dataset. All results are in accuracy (%). 1, 2, 3 represents clean input, substitution conflict, and coherent conflict respectively. JUICE could almost compeltely reverse the model tendency from being misled by context to its parametric memory in all cases.

Left: (Representative) experiment results of the intervention for enhancing contextual beliefs on the standard RAG hallucination dataset. JUICE still achieves the best steering effect. Right: A fine-grained experiment showcasing the effect of running twice v.s. running once. The result shows that naive single-pass interventions are unstable and prone to degradation. In contrast, the dual-run design ensures consistent and effective interventions.

Theoretical Analysis

The setting for our theoretical analysis. The left is the parametric task (factual recall) and the right is the context task (induction). Our first set of results is to show the existence of a perfect solver and how training by gradient descent could result in such solution.

Next, we show that how distinct, well-defined training tasks can overlap at inference, resulting in knowledge conflict (left and mid part of the figure below).

Finally, we justify the effectiveness of the run-twice strategy over run-once (in the rightmost part).

BibTeX

@article{li2025taming,
        title={Taming Knowledge Conflicts in Language Models},
        author={Li, Gaotang and Chen, Yuzhong and Tong, Hanghang},
        journal={arXiv preprint arXiv:2503.10996},
        year={2025}
      }