yuragi

Measure how unstable your LLM's confidence really is.
Perturbation-driven hallucination detection — black-box, logprob-friendly, CLI-first.

GitHub →Source, issues, contributing PyPI →pip install yuragi Releases →v0.4.1 — Apr 14, 2026 README →Usage guide Theory →Mathematical foundation Paper draft →Confidence inversion on 8B models
pip install yuragi
Latest research (v0.4.1) — Real-data benchmarks on llama-3.1-8B:

yuragi generates 13 semantic-preserving prompt perturbations (typos, tone, paraphrase, authority framing, counterfactual context) and compares the model's confidence distribution across responses. When answer text stays the same but confidence moves, that's fragility — a measurable property of prompt wording rather than model knowledge.