LLM Stock Market Showdown: Eight-Month Backtest

Update: 2025-12-05

Description

The podcast describes an experiment called the AI Trade Arena, which was created to evaluate the predictive and analytical capabilities of large language models within the financial markets. Researchers conducted an eight-month backtest simulation from February to October 2025, providing five major LLMs—including GPT-5, Grok, and Gemini—with $100,000 in paper capital to execute daily stock trades. To ensure valid results, all external information, such as news APIs and market data, was strictly time-filtered so models could not access future outcomes. The primary finding showed that Grok and DeepSeek were the top performers, a success largely attributed to the models' tendency to create tech-heavy portfolios. The project emphasizes transparency, making the reasoning behind every trade publicly available, and plans to move from simulations to live paper and real-world trading to refine model evaluation.

Comments

In Channel

Claude Code LSP Support and the IDE Identity Crisis

2025-12-2412:29

The Dawn of Reasoning: AI Reflections at the end of 2025

2025-12-2213:29

Anthropic Agent Skills: A New Paradigm for Universal AI Expertise

2025-12-2017:34

GPT Image 1.5: ChatGPT Images Strategic Shift

2025-12-1716:06

Introducing GPT-5.2: The New Frontier Model

2025-12-1513:38

LLM Stock Market Showdown: Eight-Month Backtest

2025-12-0512:58

Anthropic Bought Bun Why They Need It

2025-12-0311:23

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

2025-12-0116:24

Elon Musk: X, Starlink, and the Singularity's Edge

2025-12-0113:36

Ilya Sutskever says AI scaling is over

2025-11-2610:44

The TPU vs GPU Battle for AI Dominance

2025-11-2612:36

AI Agent design is still hard

2025-11-2417:41

Emergent Reasoning in Google's New AI Model: Unreleased AI Cracks Historical Handwriting Reasoning

2025-11-1511:38

AI-Driven Shortages in Global Storage and Memory

2025-11-1214:21

Terminal Bench Deep Dive: Why the Command Line is the Only Way to Measure Real AI Intelligence and Economic Value

2025-11-0912:09

DreamGym Decoded: How LLM Reasoning Smashes the 80,000-Step Data Bottleneck with Synthetic Experience

2025-11-0814:38

Perplexity MoE Deployment Deep Dive: The Custom Kernels and Network Secrets That Make Massive AI Models Run 5X Faster

2025-11-0616:10

Stop Vibe Coding! Cognition's Windsurf Codemaps Battles the "Comprehension Tax" to Turn Engineers' Brains On

2025-11-0512:12

OpenAI's $38 Billion AWS Deal: How a Sovereign AI Power Built a $700 Billion Multi-Cloud Empire and the Financial Bubble That Could Pop It All

2025-11-0416:37

Karpathy's AI Divide: Why We're Summoning "Ghosts," Agents Will Take a Decade, and the Brutal "March of Nines"

2025-10-1815:04

00:00

LLM Stock Market Showdown: Eight-Month Backtest

#box-pro-ellipsis-176660232181417{-webkit-line-clamp:2;}LLM Stock Market Showdown: Eight-Month Backtest

LLM Stock Market Showdown: Eight-Month Backtest

Next in AI

LLM Stock Market Showdown: Eight-Month Backtest