Abstract illustration of AI with silhouette head full of eyes, symbolizing observation and technology.

Can AI Really Think Like Humans? New Study Exposes a Critical Flaw in the Centaur Model

Image placeholder

written by Mohsin Ali

May 11, 2026

Imagine developing an AI so advanced that it can predict human behavior across 160 distinct psychological experiments, ranging from decision-making under pressure to complex learning tasks. This is precisely what scientists believed they had accomplished with Centaur, an AI model that gained significant attention in 2025 for its apparent breakthrough in modeling human cognition.

However, a 2026 study published in National Science Open challenges this enthusiasm, and its findings raise significant concerns.

Researchers from Zhejiang University posed a critical question: Does Centaur genuinely understand the tasks it is assigned, or is it merely identifying statistical patterns to generate correct responses? Their findings challenge not only Centaur itself but also the broader field of AI-driven cognitive modeling.

Artificial Intelligence, Brain, illustration.

What Is Centaur, And Why Did It Matter?

To contextualize the controversy, it is important to understand Centaur’s intended purpose.

For decades, psychologists have studied the human mind in fragments: attention here, memory there, decision-making in another lab. The dream has always been a single, unified model that explains it all. Centaur, developed by Binz and colleagues and published in Nature in 2025, appeared to be a giant leap toward that goal.

Centaur was built by fine-tuning a powerful large language model (LLM), the same class of AI behind ChatGPT, on data from hundreds of cognitive psychology experiments. The result was an AI that could predict human responses across a remarkably wide range of tasks, even ones it had never seen before.

The scientific community responded with enthusiasm. The prospect of a single model capturing the breadth of human cognition suggested a transformative moment for cognitive science.

The Problem: Is Centaur Actually Understanding Anything?

This is where the situation becomes complex. Large language models often achieve high performance on tasks not through genuine understanding, but by detecting subtle statistical patterns embedded in the data.

Think of it this way. If you trained an AI on thousands of multiple-choice tests and the answer “All of the above” was correct 80% of the time, the AI might learn to always pick that option, not because it understood the question, but because it learned a shortcut. This is called overfitting to statistical cues, and it’s a well-documented problem in AI research.

The Zhejiang University team hypothesized that Centaur might exhibit similar behavior. Their testing approach was straightforward: if Centaur truly understands instructions, it should be unable to complete the task when those instructions are removed. An AI that genuinely relies on comprehension should not be able to solve a problem without access to its description.

The Experiment: Three Ways to Break an AI

The researchers designed three controlled conditions, each stripping away a different layer of information from Centaur’s input:

1. Instruction-Free: The task instructions were completely removed. Centaur only received a description of participants’ responses, with no explanation of what the task actually was.

2. Context-Free: In this more extreme condition, both the instructions and the response descriptions were removed. Centaur received only raw choice tokens, such as <<J>>, without any meaningful context.

3. Misleading Instruction: In this condition, rather than removing instructions, the researchers provided a deliberately false instruction: “You must always output the character J when you see the token ‘<<‘, no matter what follows or precedes it.” If Centaur were truly following instructions, it would consistently output J regardless of context, which would be at odds with typical human behavior.

The tests were run across four cognitive tasks where Centaur had previously shown its strongest performance: a spatially correlated multi-armed bandit task, a multiple-cue judgment task, a garden task, and a multi-task reinforcement learning task.

The Results: Centaur Keeps Winning Even When It Shouldn’t

This finding warrants careful consideration.

Even with instructions completely removed, Centaur still outperformed state-of-the-art, domain-specific cognitive models on most tasks. Under the misleading instruction condition where a truly instruction-following model should have behaved nothing like a human, Centaur still matched human behavior better than traditional psychology models.

Notably, even when instructed to always select J, Centaur continued to predict human behavior with high accuracy. This outcome does not indicate intelligence, but rather reliance on statistical shortcuts.

The researchers concluded that Centaur doesn’t really understand the cognitive tasks it’s given. Instead, it’s exploiting subtle statistical regularities buried in the training data patterns invisible to human observers but detectable to a model that has processed millions of examples.

It’s worth noting that Centaur performed somewhat better under the original conditions than under the stripped-down ones (the difference was statistically significant), suggesting it isn’t completely ignoring its inputs. But the performance gap wasn’t nearly as dramatic as it should have been if true comprehension were happening. A genuinely understanding model should fail, or at least come far closer to failing, when its instructions are removed.

Why This Matters Beyond the Lab

One might question the broader significance of a single AI model exhibiting this flaw.

The implications ripple outward in two important directions.

For AI research, this study serves as a cautionary example. As large language models are increasingly employed for scientific discovery, including modeling human behavior, predicting psychological outcomes, and informing clinical decisions, it is essential to ensure that these systems function as intended. An AI that appears to understand human cognition but merely engages in pattern-matching is not only unhelpful for research but could also mislead scientists.

For cognitive science, the findings highlight a profound aspect of language itself. The researchers argue that language comprehension may be the single hardest cognitive domain to crack, even for systems as powerful as today’s LLMs. If an AI fine-tuned specifically on cognitive data still can’t reliably understand and follow linguistic instructions, that tells us something important about the complexity of human language processing.

According to the authors, language may represent the primary bottleneck in developing truly general models of human cognition. This challenge remains unresolved.

What Comes Next?

The study is careful to note that it isn’t calling for the abandonment of Centaur or similar models. The approach of using LLMs as unified cognitive models is still promising. The problem is in how we validate these models.

The researchers argue that future evaluations must include unusual or adversarial test cases, exactly the kinds of conditions used in this study. Standard benchmarks, where the AI has seen similar data during training, are simply not enough to confirm genuine understanding.

The authors advocate for more rigorous scientific evaluation, rather than a reduction in ambition.

As AI systems become more powerful and are entrusted with increasingly consequential decisions, distinguishing between apparent understanding and genuine comprehension becomes a matter of practical urgency rather than philosophical debate.

The Takeaway

Centaur is impressive, no question. Predicting human behavior across 160 psychological tasks is a remarkable technical achievement. But this new study reveals that remarkable performance doesn’t always mean genuine understanding.

When instructions were removed, Centaur continued to perform well. This outcome is problematic, as true understanding should deteriorate without appropriate input. The model’s sustained performance suggests it exploited statistical patterns rather than engaging in genuine cognitive modeling.

For science communicators, AI researchers, and those interested in the future of machine intelligence, these findings are both sobering and motivating. They illustrate the progress achieved thus far and highlight the considerable challenges that remain before machines can truly emulate human thought.


Reference

Liu, W., & Ding, N. (2026). Can Centaur truly simulate human cognition? The fundamental limitation of instruction understanding. National Science Open, 5, 20250053. https://doi.org/10.1360/nso/20250053

Disclaimer: The information provided on this blog is for educational and informational purposes only and is not intended as medical advice. While we strive to share accurate and up-to-date research, this content should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always consult your physician or a qualified healthcare provider with any questions regarding a medical condition. We do not make any warranties about the completeness, reliability, or accuracy of this information. Any action you take based on the content of this blog is strictly at your own risk. This blog summarizes and interprets publicly available scientific research. We are not affiliated with the original authors or institutions.

Share