SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
AuthorsMichael Kirchhoff, Luca Füger†, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Seong Joon Oh‡, Sinead Williamson
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
AuthorsMichael Kirchhoff, Luca Füger†, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Seong Joon Oh‡, Sinead Williamson
The common approach to communicate a large language model’s (LLM) uncertainty is to add a percentage number or a hedging word to its response. But is this all we can do? Instead of generating a single answer and then hedging it, an LLM that is fully transparent to the user needs to be able to reflect on its internal belief distribution and output a summary of all options it deems possible, and how likely they are. To test whether LLMs possess this capability, we develop the SelfReflect metric, an information-theoretic distance between a given summary and a distribution over answers. In interventional and human studies, we find that SelfReflect indicates even slight deviations, yielding a fine measure of faithfulness between a summary string and an LLM’s actual internal distribution over answers. With SelfReflect, we make a resounding negative observation: modern LLMs are, across the board, incapable of revealing what they are uncertain about, neither through reasoning, nor chains-of-thoughts, nor explicit finetuning. However, we do find that LLMs are able to generate faithful summaries of their uncertainties if we help them by sampling multiple outputs and feeding them back into the context. This simple approach shines a light at the universal way of communicating LLM uncertainties whose future development the SelfReflect score enables.
Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?
July 11, 2025research area Speech and Natural Language ProcessingWorkshop at ICML
This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025.
Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives. While this has been revealed with numerical certainty scores in the past, we propose to use the rich output space of LLMs, the space…
Ratings and reviews are an invaluable resource for users exploring an app on the App Store, providing insights into how others have experienced the app. With review summaries now available in iOS 18.4, users can quickly get a high-level overview of what other users think about an app, while still having the option to dive into individual reviews for more detail. This feature is powered by a novel, multi-step LLM-based system that periodically…