In recent years, the proliferation of large language models (LLMs) has transformed how individuals approach creative tasks ranging from writing prose to brainstorming novel concepts. Leading commercial systems such as GPT-4, Claude, and Google’s Gemini dominate the competitive landscape, captivating an audience eager to harness artificial intelligence’s creative potential. Yet, groundbreaking research conducted by Duke University exposes a paradox: despite the appearance of boundless originality, these AI systems exhibit a striking homogeneity in their creative output, falling short of the diverse ingenuity demonstrated by human minds.
This pivotal study, published in PNAS Nexus on March 24, 2026, challenges popular assumptions about the creative versatility of LLMs. Emily Wenger, an assistant professor specializing in electrical and computer engineering at Duke, alongside Yoed Kenett, a cognitive neuroscientist affiliated with the Technion in Israel, spearheaded an experimental comparison between 22 commercial LLMs and over 100 human participants across three well-established creativity tests. Their findings reveal that, although individual models occasionally surpass single humans in creativity metrics, the collective responses generated by these models are significantly more uniform than those produced by people.
Researchers attribute this convergence to the underlying architecture and training regimes shared by commercial LLMs. These models ingest vast datasets scraped from the public internet, encompassing a broad yet overlapping corpus of human knowledge and language. The identical goal function—producing coherent, contextually appropriate text—further drives model output toward a narrow band of plausible creative avenues. Wenger observed that because all models are effectively “speaking from the same script,” their responses echo one another, limiting the diversity critical to true creative innovation.
To quantify this, the researchers employed three long-standing assessments of divergent thinking and associative richness: the Alternative Uses Test (AUT), the Divergent Association Task (DAT), and the Forward Flow (FF) test. AUT prompts examinees to conceptualize unconventional applications for everyday objects, such as envisioning a book as a makeshift doorstop or a firestarter. DAT demands the generation of ten semantically distant words, probing the breadth of associative leaps. Finally, FF entails a sequential chain of word associations stemming from a seed word, measuring the fluidity and novelty of cognitive transitions.
The choice of tests was strategic, targeting cognitive mechanisms fundamental to creativity: variability, originality, and the looseness of conceptual networks. Human responses demonstrated extensive heterogeneity in word choice and concept paths, embodying flexible, explorative thought processes. Conversely, the LLM outputs were markedly homogenized, clustering around common or high-probability associations that reflect their training “bias” toward widely observed linguistic patterns rather than unpredictable ideation.
Kenett elaborates on the implications of this phenomenon: “Human creativity thrives on variability, enabling breakthroughs through unique and often unexpected connections. If AI-generated content converges too tightly in its creative space, it risks stagnating innovation and promoting uniformity.” This homogenization is a subtle but profound limitation, potentially undermining the transformative promise of AI in augmenting human creativity. It raises a red flag concerning the overreliance on these systems in creative industries and beyond.
The study also experimented with manipulating the system prompt—the initial instructions guiding the LLMs’ behavior—to encourage more divergent and novel outputs. Yet, these manipulations only yielded marginal increases in variability, insufficient to rival human-generated diversity. This points to intrinsic constraints in the models’ design or training data that mere prompting cannot overcome. It suggests a deeper architectural or methodological innovation might be necessary to break the creative mold imposed by current training paradigms.
Given the surge in LLM adoption highlighted by a 2024 Adobe survey—reporting over 50% of Americans have engaged AI systems as creative partners in writing, coding, and ideation—this research carries urgent pragmatic warnings. The ubiquity of these tools risks cultivating a monoculture of expression, where novel concepts and linguistic creativity diminish as AI-assisted works increasingly converge stylistically and conceptually. Such an outcome challenges the narrative of AI as a catalyst for human innovation.
Wenger strongly advocates for continued human involvement in creative processes, particularly when originality and differentiation are prized. She recommends assembling diverse human teams for brainstorming to combat creative stagnation rather than relying solely on AI-generated content. The nuanced interplay of distinct human experiences and cognitive variability remains unmatched by algorithmically generated ideas, reinforcing the indispensability of human creativity in a world increasingly intertwined with AI.
This research adds a crucial dimension to ongoing discussions about the role of AI in creative disciplines, emphasizing that the promise of generative technologies must be balanced with awareness of their limitations. It calls for further interdisciplinary inquiry into the cognitive and computational factors shaping AI’s creative scope, especially as these agents become more embedded in professional and everyday workflows. Understanding and addressing AI homogeneity could pave the way for future models capable of supporting genuinely diverse, groundbreaking creativity.
In essence, while commercial large language models represent remarkable feats of engineering and linguistic fluency, their creative capacity remains intrinsically bounded by their training data, architecture, and the optimization goals set during development. As this Duke-led study compellingly reveals, the creative frontier still belongs chiefly to humans, whose unpredictable and varied cognitive styles defy the algorithmic tendencies toward uniformity. The ongoing challenge is to design AI that not only mimics human language but also fosters an ecosystem where creativity flourishes in its full, vibrant diversity.
Subject of Research: Not applicable
Article Title: Large language models are homogeneously creative.
News Publication Date: 24-Mar-2026
Web References: https://doi.org/10.1093/pnasnexus/pgag042
References: Emily Wenger and Yoed N. Kenett. “Large language models are homogeneously creative.” PNAS Nexus, 2026, 5, pgag042.
Keywords: Generative AI, Large language models, Creativity, Artificial intelligence, Divergent thinking, Cognitive variability
Tags: AI creative limitationsAI creativity paradoxAI training data impactcognitive neuroscience of creativitycommercial AI systems comparisondiversity in creative thinkingDuke University AI researchfuture of AI in creative industriesGPT-4 creativity analysishomogeneity in AI outputlarge language models creativityLLMs vs human creativity



