NEW Tech AI Image, 640
AI isn't ready to replace your fund manager — and the public experiments testing it are showing why.
Across a series of new trading contests between the world's leading AI models, the verdict so far is unflattering. Most of the systems lose money. They trade too much. They make wildly different decisions when given identical instructions.
And no one yet knows if these shortcomings will fade with more powerful iterations — or if they reveal something fundamental about the gap between large language models and how markets actually work.
Take Alpha Arena, run by tech startup Nof1. It pitted eight major frontier AI systems — including Anthropic's Claude, Google's Gemini, OpenAI's ChatGPT and Elon Musk's Grok — against each other in four separate competitions.
Each was handed $10,000 per contest before being turned loose on U.S. tech stocks for two weeks. The challenges involved trading on a variety of signals, acting defensively, reacting to the competition, and using high leverage.
The portfolio as a whole lost about a third of its capital. Across all 32 sets of results, a model finished in profit only six times. Grok 4.20 delivered the best performance during the challenge in which it was aware of its rivals' performance. It placed only 158 trades; under the same prompt, Alibaba's Qwen traded 1,418 times.
Alpha Arena is one of a growing number of experiments testing whether LLMs can do the hardest job in finance: beat the market. While these contests are far from academically rigorous, they're the most public demonstration yet of what happens when the systems try to take on some of the most lucrative and high-stakes work on Wall Street.
The early results matter because trading is one job the financial industry has been cautious about handing entirely to AI. Over the past few years, heavyweights from JPMorgan Chase & Co. to Balyasny Asset Management have put the technology to work nearly everywhere else.
LLMs now parse news at quant shops, draft memos at hedge funds, and detect fraud at big banks, among other tasks. But "human in the loop" remains the motto when it comes to trading real money. Perhaps for good reason.
"LLMs can't really make money by themselves," said Jay Azhang, founder of Nof1. "You need basically a very sophisticated harness and scaffolding and data platform in order to even give them a chance."
LLMs are good at doing research and finding and deploying the correct tools for certain tasks, he said. But they don't yet know how much each of the many variables that swing stocks — including things like analyst ratings, insider transactions, and sentiment shifts — actually matters. They tend to mistime their trades, incorrectly size positions and buy and sell too often.
The AI blog Flat Circle tracked 11 markets-related arenas, and all had at least one model that made money. But in only two of the arenas was the median model profitable, showing how most struggled to beat the market.
That outcome mirrors human performance, since a majority of actively managed funds famously also lag the broad market. And just like people, the models can be prone to obvious bias.
The arenas show the AI systems making very different decisions with identical instructions, which has big implications for any firm deploying them. For instance, Azhang said that in Alpha Arena's latest run, Claude mostly wanted to go long, Gemini had no problem being short, and Qwen was comfortable taking risks with big leverage.

"They have personalities that you have to manage almost like a human analyst," said Doug Clinton, who runs Intelligent Alpha, a firm with an LLM-driven fund that publishes its own benchmark for how well AI predicts corporate earnings. Results can be improved by letting the model know it's showing some bias, he said.
Intelligent Alpha's benchmark gives 10 AI models access to financial filings, analyst forecasts, earnings transcripts, macroeconomic data and up to 10 web searches. With its narrower focus, the results are more positive for LLMs.
In the fourth quarter of 2025, OpenAI's ChatGPT correctly predicted the direction of earnings estimates 68% of the time — the best results yet. And the models, Clinton said, tend to improve with every new release.
Hedge Fund Secrets
Evaluating any of this is hard. Design choices in everything from how often the models run to what assets they trade makes a big difference. And the default test for a trading strategy — running it backward through history to see how it would have performed — doesn't really work for AI.
A model asked in 2026 how it would have traded in March 2020 already knows what March 2020 looked like. That contamination, known as lookahead bias, has challenged the frameworks underlying academic and quantitative finance for decades. LLMs have to be assessed in live markets instead, hence the proliferation of benchmarks and arenas.
Perhaps because they mostly lose money, AI trading arenas tend to run for only short periods of time. With the low barriers to entry, many are set up by individuals or startups using the platforms as a launchpad for other products.
Nof1 is preparing season two of Alpha Arena, which will give each AI model the ability to search the web, ponder for longer, access more data sources and take multiple steps. But ultimately the firm's business is a system enabling retail traders to build AI trading agents for their own strategies.
"Giving an LLM money right now and just having it go — that's not a thing yet," said Azhang.
Most of the public experiments are still too short and too noisy to support firm conclusions, reckons Jim Moran, who writes the Flat Circle blog and who previously co-founded alternative-data provider YipitData. These arenas also have natural disadvantages, including limited access to proprietary stock research and inferior execution.
"If you took one of these agents from one of these arenas and you just moved it over to operate inside of a high-end hedge fund, they should perform better," he said.
Alexander Izydorczyk, formerly head of data science at the hedge fund Coatue Management and now at NX1 Capital, recently wrote that no AI trading bot he tracks has yet shown a lasting edge. He argued the arenas are limited by what they cannot see in their training data: the practical quant techniques used inside secretive trading shops.
He suggested the same secrecy is also a preview of where any AI that does begin to work will eventually go.
"But beginners sometimes see things incumbents cannot," Izydorczyk wrote on his personal blog. "The outsiders, if successful, will also learn quickly that success in liquid, competitive markets pays better than the marginal X follower. When LLM agent trading strategies start working, you will not hear about it for a while."
(Adobe Stock)
Copyright 2026 Bloomberg. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.
© Touchpoint Markets, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to TMSalesOperations@arc-network.com. For more information visit Asset & Logo Licensing.