← Back to Projects
ML - NLP2025

Selective Trust Head — Inference-Time Reliability for LLMs

Ranks LLM answers by trustworthiness using single-forward-pass uncertainty signals (selective prediction).

Project Info

Domain: ML - NLP
Year: 2025

Quick Stats

Task
Selective prediction / ranking
Signals
Single forward pass
Metrics
AP, ROC-AUC, Brier, ECE

Languages

PythonLaTeX

Tools

scikit-learnNumPyPandasHugging Face TransformersDistilBERT

Skills

LLM ReliabilitySelective PredictionCalibrationFeature EngineeringNatural Language ProcessingAbstentionSupervised LearningClass ImbalanceTokenizationSliding Windows

Gallery

What

LLMs often sound confident while being wrong. This project trains a lightweight trust head that scores answers by reliability using only inference-time uncertainty signals from a single forward pass.

Rather than binary correctness, the objective is selective prediction: rank answers so that the most trustworthy outputs can be prioritized under limited coverage.

How

Engineered leakage-safe uncertainty features from token-level start and end distributions, including entropy, span-vs-null margins, top-2 gaps, and probability mass summaries.

Trained simple trust heads such as logistic regression to map these features to a scalar trust score, optimizing ranking quality under class imbalance using Average Precision.

Results

Achieved strong ranking performance despite a signal-limited regime in which multiple model families converged to nearly identical selective curves.

Evaluated probability alignment and reliability using Brier score and Expected Calibration Error to assess when trust scores behave meaningfully as probabilities.

Key Takeaways

Inference-time uncertainty contains real but limited separable signal, making it effective for triage rather than full verification.

The primary value lies in prioritization: surfacing high-trust answers first and routing low-trust outputs to abstention or verification.