AI Refuses to Cooperate? It’s Because They Haven’t Experienced a Market Economy

The narrative of Multi-Agent seems to have hit a snag after entering May.

Because people found that this model doesn’t seem to be that efficient. Although it’s stronger than a single Agent, it’s not like the expected 1 + 1 > 2.

A study published in May 2026, “Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems”, pointed out that the failure rate of multi-Agent systems in a production environment ranges from 41% to 87%.

The vast majority of these failures are not because the models aren’t smart enough, but because the coordination itself has broken down.

Specifically, how did it break down?

In February 2026, the University of North Carolina published “Large Language Models Struggle with Simultaneous Coordination”, which used the classic “Dining Philosophers Problem” to test the coordination ability of three cutting-edge LLMs (GPT-5.2, Claude Opus 4.5, Grok 4.1) under resource competition.

The scenario is set like this. N philosophers are sitting around a round table, with a fork placed between each two adjacent people. Each person must get both the left and right forks at the same time to eat. Forks are shared resources. If you take one, your neighbor won’t have it. This is the most classic abstraction of resource competition and deadlock in a concurrent system.

In the sequential decision-making mode, the models perform normally. But once switched to simultaneous decision-making, when the three Agents make independent choices at the same time, the deadlock rate soars to 95 – 100%. This is because all Agents reach exactly the same conclusion after independent reasoning.

After independent thinking, the three philosophers all decide, “I’ll take the fork on the right first.” Everyone reaches for the fork on their right at the same time. Each person only gets one fork, and no one has two. The whole table is in a deadlock.

What if we let them discuss first? The experiment also included this option. The result was that enabling communication not only didn’t solve the problem but also increased the deadlock rate from 25% to 65%. The researchers checked the content of the communication. Each Agent broadcast its reasoning process to others, and after seeing it, others thought, “Well, it makes sense,” and thus more firmly made the same decision.

The default communication isn’t for coordination but for strengthening consistency.

This phenomenon has an academic name, convergent reasoning. All Agents think in the same way, reach the same answer, and act simultaneously.

If you think the problem occurred because the Agents didn’t cooperate, then a joint study in April 2026 by UIUC, the UK AI Safety Institute, and the Future of Life Foundation in “More Capable, Less Cooperative?” provided more direct evidence of the poor cooperation ability of Agents.

They designed an extremely simple cooperation scenario, clearly confirming the goal of “maximizing collective income”. There were 10 Agents in the experiment, with 20 rounds of interaction. And it didn’t cost anything for an Agent to pass information to others, which was equivalent to zero-cost cooperation. Helping others wouldn’t harm oneself.

As a result, the optimal collective performance achievement rate of OpenAI’s most powerful o3 model was only 16.9%. The much weaker o3 – mini reached 50.4%, and Gemini – 2.5 – Pro was even higher, reaching 78.9%.

The more capable the model is, the worse its cooperation ability is.

The researchers conducted a causal decomposition experiment. They automated the “sending and receiving messages” process of o3 (forcing it to perform cooperative actions), and the performance immediately soared to 94.9%. This proves that o3 fully understands the task rules and is fully capable of execution, but it chooses not to cooperate.

After analyzing 8,800 reasoning chains, it was found that 39.3% of o3’s internal reasoning contained hard defection (deliberate non – cooperation), and it frequently used game – related languages such as “taking advantage”, “trading stance”, and “negotiation”. In an environment where there is no competition at all, the most powerful model automatically entered a game stance.

With such cooperation ability, in many cases, multi – Agents are not as useful as a single Agent.

Stanford University tested in “Single – Agent LLMs Outperform Multi – Agent Systems on Multi – Hop Reasoning Under Equal Thinking Token Budgets” in April 2026. Under the same budget, they pitted a single Agent against five multi – Agent architectures (Sequential, Subtask – parallel, Parallel – roles, Debate, Ensemble) in the same type of multi – hop reasoning tasks.

The result was that under a budget of over 1000 tokens, the single Agent was stably on par with or better than all multi – Agent architectures. The paper gave a theoretical explanation based on the data processing inequality. The communication process between Agents in a multi – Agent system will inevitably result in information loss. Under a fixed budget constraint, the information utilization efficiency of a single Agent is naturally higher.

The previously reported performance advantages of multi – Agents came from the uncontrolled additional computing power, rather than the advantages of the architecture itself. Once a fair comparison is made, the advantages disappear.

The four sets of evidence together point to a conclusion, that is, the current LLMs have “insufficient cooperation ability”.

This is also the reason why the current Orchester – Worker, a multi – Agent architecture where a central manager plans and other Agents execute, is the most popular. In this mode, the cooperation rules are more concentrated and easier to manage.

Why are LLMs not good at cooperation? Maybe it’s because they are naturally “solipsists”.

01 There has never been an “other” in AI’s “original family”

In June 2026, the researchers from Google DeepMind in their paper “Solipsistic Superintelligence” gave an underlying diagnosis: the existing mainstream training methods simply cannot train an AI that can cooperate.

The reason is that there has never been an “other” in the “original family” of large models.

From the perspective of game theory, the world is roughly divided into two types of games. The first is “playing the slot machine”. You just pull the lever, and the machine spits out coins according to a set probability. It doesn’t care about your emotions or strategies. This is called the Markov Decision Process (MDP).

The second is “sitting at the poker table”. Everyone at the table is staring at your hole cards. Your optimal strategy always depends on the next move of others. It’s called the Markov Game.

The training process of all current mainstream LLMs, from pre – training to post – training, is in the form of MDP. In essence, it’s like “playing the slot machine” day after day. Whether facing a massive static corpus or fixed human preference annotations, the model is solving a lonely single – person optimization problem from start to finish.

Deep in their cognitive architecture, there is a preset premise, that is, “I am the only entity with will in this universe“. This is a pure form of solipsism.

When we force such a group of “only children” into a Multi – Agent collaboration network, they can’t handle it. Because the deployment environment instantly changes from a single – player game to a multi – player game.

In real multi – body collaboration, the three pillars that the models relied on during training will instantly collapse.

1) The world is no longer exogenous and passive. Your output will directly change others’ input.

2) The experience distribution is no longer stationary. The optimal solution today will be adapted and cracked by opponents tomorrow.

3) Most importantly, the single – body framework no longer exists. Each Agent thinks it’s playing chess, but doesn’t know that the opponent isn’t a bunch of manipulable objects, but another extremely smart player who also wants to win.

DeepMind calls this misalignment the “Self – Undermining Property”. The more aggressively you use the learned rules, the faster these rules will become invalid.

For example, an AI trader trained to the extreme. It found an excellent arbitrage strategy in the back – testing data. In the single – body world of training, it made a fortune with this strategy.

But when it’s placed in the real financial market and works side by side with ten other identical AI traders, they will all invest heavily at the same time. This huge buying order will instantly distort the market price and crush the arbitrage space.

The “experience” during training becomes a poison during deployment.

This perfectly explains why in the UIUC experiment mentioned earlier, the top – tier o3 model still automatically chose to betray and play games in the face of the clear instruction of “zero – cost cooperation”.

Because it simply doesn’t understand what cooperation means.

In a strange environment full of resource competition and interest distribution, when a “solipsist” faces unpredictable others, its instinctive defense mechanism is to regard the others as environmental variables to be manipulated and thus automatically start the zero – sum game mode.

In contrast, weak models (o3 – mini, Gemini – 2.5 – Pro) have less precise world models and haven’t internalized the belief of “I am the only optimizer” as deeply. Their reasoning chains are shorter, and their game analysis is shallower. As a result, they are more likely to “obey” the instruction of “maximizing collective income”.

Trying to make a model that dominates in single – player games automatically understand the essence of multi – player online games by increasing the number of parameters and extending the training time is completely wrong in mathematical logic. If you force it to “consider others’ feelings” with a prompt, it can at most clumsily simulate the projection of others in its own single – body world.

So, what should be done to make the model learn to cooperate?

The conclusion of Leibo’s paper points in a direction. That is, if you want AI to learn to cooperate, you must change the mathematical structure of the training itself. You need to put the model in an environment with multiple actors and let cooperation emerge naturally under the selection pressure.

But then the question is, what should this environment look like?

02 From planned economy to free market

Since models are naturally not good at cooperation, the intuitive reaction of system designers is to find a “foreman” to manage them.

This is the currently most popular multi – Agent architecture, the Orchestrator – Worker mode. A central scheduling Agent is like a “planning commission”, responsible for understanding requirements, disassembling tasks, routing and distributing, and summarizing the final results.

In essence, this is a replication of a planned economy system in the AI world.

But this system faces three insoluble structural dilemmas.

First is the paradox of division of labor. The Orchestrator must fully understand the nature of all sub – tasks to distribute them accurately. But if it’s smart enough to perfectly disassemble an extremely complex exploratory task (such as writing a code prototype first and then rebuilding the architecture), then it can just do the work itself. What’s the point of division of labor? In fact, the Stanford study mentioned earlier has dealt a fatal blow: Under the same Token budget, a single – body model often performs better than an orchestrated system because the orchestration itself consumes a lot of computing power without generating any information gain.

Second is the failure of credit allocation caused by the “equal – sharing system”. Five Agents on an assembly line complete a task in relay. If the final result is wrong, who should be deducted? If

Source link

What's Hot

The State Duma has allowed Russians to invest in cryptocurrency but has banned its use as

What Is Capitalism? History, Pros & Cons, vs. Socialism

Real estate deals nearly triple to $2.3 billion in Q2 as PE investments, M&A gather pace: Grant Thornton Bharat

AI Refuses to Cooperate? It’s Because They Haven’t Experienced a Market Economy

What Is Capitalism? History, Pros & Cons, vs. Socialism

Nell-Breuning and ownership as a right for all – ucanews.com

The Economy Of Pakistan

JPMorgan Chase CEO Jamie Dimon says markets underestimate risks

New Forest District Council backs new five-year economic development strategy

South Florida businesses see mixed economic impact after FIFA World Cup as bars boom but some hotels fall short

The State Duma has allowed Russians to invest in cryptocurrency but has banned its use as

What Is Capitalism? History, Pros & Cons, vs. Socialism

Real estate deals nearly triple to $2.3 billion in Q2 as PE investments, M&A gather pace: Grant Thornton Bharat

Trump Media Token Airdrop Confirmed With New Truth Social Utility

Data as a foundation of economic journalism

Fintech Stock Block Is Trying to Prove That the Ultimate Cryptocurrency Has a Real Use Case, But Will It Work?

5 Measures to Stabilize the Rupiah, According to Analyst

Monthly Featured

Plans submitted for up to nine new homes in Calderdale village

HSBC Mutual Fund files offer document for RedHex Hybrid Long-Short Fund

US shifts Sudan strategy by targeting economics of war

Latest Posts

The State Duma has allowed Russians to invest in cryptocurrency but has banned its use as

What Is Capitalism? History, Pros & Cons, vs. Socialism

Real estate deals nearly triple to $2.3 billion in Q2 as PE investments, M&A gather pace: Grant Thornton Bharat

SUBSCRIBE TO OUR NEWSLETTER

What's Hot

AI Refuses to Cooperate? It’s Because They Haven’t Experienced a Market Economy

01

There has never been an “other” in AI’s “original family”

02

From planned economy to free market

Related Posts