MetaStone-S1: Reflective Generative AI for Test-Time Scaling

Update: 2025-09-02

Description

This document introduces MetaStone-S1, a novel reflective generative model designed for Test-Time Scaling (TTS) in large language models (LLMs). The core innovation is a Reflective Generative Form that unifies the policy model and a Self-supervised Process Reward Model (SPRM) within a single network. This integration allows MetaStone-S1 to efficiently generate and select high-quality reasoning trajectories without relying on expensive, human-annotated process-level data, instead learning from outcome rewards. The research demonstrates that MetaStone-S1, with only 32 billion parameters, achieves performance comparable to OpenAI's o3-mini series across various benchmarks, including mathematics, coding, and Chinese reasoning. The paper also explores the scaling law of these models and identifies an "aha moment" during training where the SPRM begins to effectively distinguish between correct and incorrect reasoning.

Comments

In Channel

Andrej Karpathy on AI, Intelligence, and Education

2025-10-2136:19

Untangling the xAI-OpenAI Legal War: Trade Secrets and Antitrust

2025-10-0418:09

IBM Granite 4.0: Hybrid Mamba/Transformer Breakthrough for Enterprise LLMs?

2025-10-0314:03

Anthropic's Claude Sonnet 4.5: The New Coding Standard?

2025-09-3016:08

GPT-5-Codex: Agentic Coding and OpenAI's Evolution

2025-09-2213:40

Grok 4 Fast: Speed, Efficiency, and Application Review

2025-09-2214:52

How to Read a Research Paper

2025-09-1407:15

The Science of Sampling

2025-09-1406:58

GPT-5 Revisited: Progress, Performance, and User Experience

2025-09-1213:49

Thyme Autonomous AI that Sees, Codes and Solves Problems

2025-09-1141:04

YaRN: Extending LLM Context Windows Efficiently

2025-09-1006:27

Ilya Sutskever's AI Vision: From Deep Learning Dogmas to Safe Superintelligence

2025-09-0949:45

Thyme: Think Beyond Images with Code-Executing MLLMs

2025-09-0707:50

What did Ilya see?

2025-09-0649:45

Meta's AI Ambitions: Turbulence in Superintelligence Labs

2025-09-0515:20

Hierarchical Reasoning: Bigger Isn't Always Better

2025-09-0407:35

Prime Collective Communications Library: A Technical Report

2025-09-0301:16:03

Prime Collective Communications Library: A Technical Report

2025-09-0307:24

MetaStone-S1: Reflective Generative AI for Test-Time Scaling

2025-09-0206:52

MetaStone-S1: Reflective Generative AI for Test-Time Scaling

2025-09-0245:03

00:00

MetaStone-S1: Reflective Generative AI for Test-Time Scaling

#box-pro-ellipsis-176119349595781{-webkit-line-clamp:2;}MetaStone-S1: Reflective Generative AI for Test-Time Scaling

MetaStone-S1: Reflective Generative AI for Test-Time Scaling

Neuralintel.org

MetaStone-S1: Reflective Generative AI for Test-Time Scaling