Project Info
Quick Stats
Languages
Tools
Skills
Gallery
What
LLMs often sound confident while being wrong. This project trains a lightweight trust head that scores answers by reliability using only inference-time uncertainty signals from a single forward pass.
Rather than binary correctness, the objective is selective prediction: rank answers so that the most trustworthy outputs can be prioritized under limited coverage.
How
Engineered leakage-safe uncertainty features from token-level start and end distributions, including entropy, span-vs-null margins, top-2 gaps, and probability mass summaries.
Trained simple trust heads such as logistic regression to map these features to a scalar trust score, optimizing ranking quality under class imbalance using Average Precision.
Results
Achieved strong ranking performance despite a signal-limited regime in which multiple model families converged to nearly identical selective curves.
Evaluated probability alignment and reliability using Brier score and Expected Calibration Error to assess when trust scores behave meaningfully as probabilities.
Key Takeaways
Inference-time uncertainty contains real but limited separable signal, making it effective for triage rather than full verification.
The primary value lies in prioritization: surfacing high-trust answers first and routing low-trust outputs to abstention or verification.