In a groundbreaking study published in Nature Machine Intelligence, researchers have unveiled a nuanced understanding of the cognitive biases that influence large language models (LLMs) when they generate responses. The paper, titled “Competing Biases Underlie Overconfidence and Underconfidence in LLMs” by Kumaran, Fleming, Markeeva, et al., delves deeply into the dual nature of confidence errors exhibited by these AI systems—shedding light on why sometimes LLMs are overly certain of inaccurate responses, while at other times they are unduly cautious regarding correct answers.
Large language models have become ubiquitous in numerous applications, ranging from customer service chatbots to creative writing assistants, and scientific research aids. Despite their impressive performance, one persistent issue has been their calibration—how well the models’ confidence estimations match the actual correctness of their outputs. This calibration directly affects trustworthiness and usability. Overconfidence can foster misleading information, causing users to rely on incorrect answers, while underconfidence might lead models to undervalue their predictions even when they are accurate.
The investigators approached this problem by considering what they call “competing biases.” These are systematic tendencies within the LLM architectures and training paradigms that result in diametrically opposed confidence distortions. Using a mixture of behavioral experiments on model outputs and rigorous statistical modeling, the team deconstructed the origins of these opposing biases, providing a fresh perspective on why LLMs oscillate between over- and under-confidence.
In their experimental setup, the researchers tasked several state-of-the-art LLMs with answering questions across multiple difficulty levels and domains. They simultaneously asked the models to provide a confidence rating for each response, effectively measuring the alignment between the model’s internal certainty and external accuracy. Analysis of this dataset revealed a striking pattern: when tackling simpler questions, LLMs tended to exhibit overconfidence, excessively affirming their correct and incorrect answers with high certainty. In contrast, for more ambiguous or complex prompts, models frequently demonstrated underconfidence, hesitating even in instances where their responses were correct.
Delving deeper, the team proposed that overconfidence stems primarily from reinforcement biases embedded during training. Since LLMs are frequently optimized to produce plausible or statistically likely continuations based on vast corpora, their internal reward structure prioritizes responses that “sound right” rather than those with guaranteed factual accuracy. Consequently, this creates a tendency towards certainty when the model’s heuristic approximations strongly match familiar patterns, irrespective of actual correctness.
Conversely, underconfidence emerges from uncertainty estimation mechanisms and probabilistic diffusion within the models’ parameter spaces. When confronted with questions that invoke underrepresented knowledge or conflicting evidence, LLMs propagate uncertainty signals leading to conservative confidence judgments. This cautious behavior, while desirable in certain contexts, might unfairly obscure their correct answers beneath a veil of doubt.
One of the paper’s most illuminating contributions is the conceptual integration of these two opposing biases within a unified framework. By mathematically characterizing overconfidence and underconfidence as competing forces drawn from different stages of model training and inference, the authors furnish a theoretical foundation for future work aiming to calibrate and optimize confidence outputs more effectively.
The practical implications of these findings are significant. For developers and end-users relying on LLMs in critical applications—including medical advice, legal reasoning, or scientific analysis—recognizing and compensating for these biases can dramatically enhance reliability. The study encourages the development of hybrid confidence estimation systems that dynamically adjust model certainty depending on input complexity and contextual domain knowledge.
Moreover, the researchers advocate for a paradigm shift in LLM training strategies. Instead of solely focusing on minimizing error rates or maximizing likelihood, integrating explicit confidence calibration targets could help moderate these opposing biases. Strategies like adversarial training focused on uncertainty prediction, calibrated fine-tuning using human-in-the-loop feedback, or incorporating meta-cognitive modules might be promising avenues to explore.
The study also carries broader philosophical implications about machine cognition and interpretability. It underscores that confidence—often treated as an ancillary metric—embodies deep computational challenges tied to the very nature of probabilistic learning and pattern recognition. Understanding confidence biases not only improves AI usability but also provides insights into analogous phenomena in human cognition, where overconfidence and underconfidence can impact decision-making.
Beyond the immediate technical contributions, the paper’s authors highlight the need for a standardized evaluation protocol to assess confidence calibration across different LLM architectures. Currently, diverse benchmarks and varying methodologies make cross-model comparisons challenging. Establishing universal metrics for confidence bias quantification would facilitate benchmarking and drive industry-wide improvements.
The dataset generated through this research, comprising thousands of model responses paired with confidence ratings and ground truth labels, constitutes a valuable resource for the AI community. By making this dataset publicly available, the authors invite further exploration into how different training datasets, model sizes, and architectures influence confidence behavior, promoting transparency and reproducibility.
In summary, this pioneering investigation into the dual biases that govern LLM confidence marks a crucial step toward resolving one of the pressing reliability issues in contemporary AI. By articulating the mechanisms behind overconfidence and underconfidence, the research paves the way for smarter, more trustworthy language models. It challenges the community to refine AI self-awareness, making future interactions safer, more transparent, and ultimately more aligned with real-world needs.
As AI systems continue to permeate everyday life and high-stakes environments alike, such insights are invaluable, reminding us that the road to truly intelligent machines requires mastering not just what they say but how sure they are when saying it. This study invites renewed optimism that with rigorous scientific inquiry and cross-disciplinary collaboration, large language models can evolve from impressive mimics of language to genuinely reliable partners in human endeavor.
Subject of Research: Confidence estimation biases in large language models (LLMs)
Article Title: Competing Biases Underlie Overconfidence and Underconfidence in LLMs
Article References:
Kumaran, D., Fleming, S.M., Markeeva, L. et al. Competing Biases underlie Overconfidence and Underconfidence in LLMs. Nat Mach Intell (2026). https://doi.org/10.1038/s42256-026-01217-9
Image Credits: AI Generated
DOI: https://doi.org/10.1038/s42256-026-01217-9
Tags: AI response accuracy and confidencebehavioral experiments on AI modelsbiases in natural language processingcalibration of AI model predictionscompeting cognitive biases in AIconfidence errors in machine learningimpact of biases on AI outputsimproving LLM reliabilitylarge language models confidence calibrationLLM trustworthiness issuesoverconfidence in LLMsunderconfidence in language models



