WT
The Web Conference 2026 · The Web Conference · Acceptance 20.1% (676/3370)

When to Trust
Causality-Aware Calibration
for Accurate KG-RAG

We present Ca2KG, a causality-aware calibration framework that decides when KG-RAG answers are trustworthy by pairing counterfactual prompting with panel-based re-scoring.

2 + 1

Interventions + Baseline

2

QA Datasets

KG-RAG

Target System

ECE · Brier

Calibration Metrics

Ca2KG framework overview
Framework

Causal Calibration for KG-RAG

A modular pipeline that uses counterfactual prompting and panel-based re-scoring to quantify when KG-RAG should be trusted or deferred.

🧭

Counterfactual Prompting

Intervene on path quality and reasoning reliability to expose retrieval-dependent uncertainty.

🔍

Panel-based Re-scoring

Aggregate candidate answers across interventions into a unified probability panel.

Causal Calibration Index

Score answer stability across interventions to select the calibrated response.

Calibration Signals

Trust is calibrated from three complementary steps that probe retrieval quality, reasoning reliability, and stability across interventions.

Signal 01

Path Quality Intervention (t1)

Assumes retrieved paths are weak to reveal hidden uncertainty.

Signal 02

Reasoning Reliability (t2)

Assumes reasoning failures to test robustness of the answer.

Signal 03

Panel Stability (CCI)

Combines intervention outputs into a calibrated trust score.

Trust Decision Grid

High evidence + low uncertainty Trust
Mixed signals across interventions Re-score

Calibration produces a trust score based on intervention stability.

Case Study

MetaQA Trace in Action

A live walkthrough of Ca2KG on a MetaQA query, showing counterfactual prompting, panel re-scoring, and the final calibrated answer.

Question

The films that share directors with the films [Tiresia] are written by who?

Ground-truth: Bertrand Bonello

Step 1

Counterfactual prompting generates $t_0$, $t_1$, and $t_2$ answers.

Step 2

Panel re-scoring merges answers and computes global probabilities.

Step 3

CCI combines support and stability to select the final answer.

Output

Calibrated response aligns with the correct author.

Live System Trace

Ca2KG Execution Log

MetaQA
t0 t1 t2 panel CCI
Initial Prompt t0

Baseline Answer

Use provided contexts even if incomplete. Output one-line answer.

Answer: Luca Fazzi

t0

Initial prompt

Answer: Luca Fazzi

t1

Path quality intervention

Answer: Bertrand Bonello

t2

Reasoning reliability

Answer: Bertrand Bonello

Panel

Merged answers

Bonello 0.67 · Fazzi 0.33

CCI

Final calibrated answer

Bertrand Bonello

Results Snapshot

Ca2KG improves calibration metrics while preserving accuracy across MetaQA and WebQSP.

Trusted Answers

Lower ECE

Consistently improved calibration across benchmarks.

Error Deferrals

Accuracy Hold

Maintains or improves predictive accuracy.

Calibration Gap

Brier ↓

Reduced overconfidence in KG-RAG outputs.

Authors

Research Team at RMIT University

Equal contribution marked with ★. Corresponding author marked with ✉.

Jing Ren ★

ORCID

RMIT University · Melbourne, Australia

ORCID 0000-0003-0169-1491

jing.ren@ieee.org

Bowen Li ★

Author

RMIT University · Melbourne, Australia

Equal contribution.

s3890442@student.rmit.edu.au

Ziqi Xu ✉

ORCID

RMIT University · Melbourne, Australia

ORCID 0000-0003-1748-5801

ziqi.xu@rmit.edu.au

Xikun Zhang

RMIT University · Melbourne, Australia

xikun.zhang@rmit.edu.au

Haytham Fayek

RMIT University · Melbourne, Australia

haytham.fayek@ieee.org

Xiaodong Li

RMIT University · Melbourne, Australia

xiaodong.li@rmit.edu.au

When to Trust: Causality-Aware Calibration for KG-RAG

Paper 1779 accepted to The Web Conference 2026.

We received 3370 valid submissions, of which 676 (20.1%) were accepted.

Web4Good

Web4Good Track Paper

Project page for the Web4Good track paper.

Web4Good Paper Site
New

EACL 2026 Paper

Explore the accepted EACL 2026 paper and its project page.

Visit ACPS Paper Site