Variational Reasoning for Language Models

Update: 2025-09-30

Description

🤗 Upvotes: 51 | cs.CL, cs.AI, cs.LG

Authors:

Xiangxin Zhou, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, Tianyu Pang

Title:

Variational Reasoning for Language Models

Arxiv:

http://arxiv.org/abs/2509.22637v1

Abstract:

We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-trace objective for tighter bounds and propose a forward-KL formulation that stabilizes the training of the variational posterior. We further show that rejection sampling finetuning and binary-reward RL, including GRPO, can be interpreted as local forward-KL objectives, where an implicit weighting by model accuracy naturally arises from the derivation and reveals a previously unnoticed bias toward easier questions. We empirically validate our method on the Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks. Overall, our work provides a principled probabilistic perspective that unifies variational inference with RL-style methods and yields stable objectives for improving the reasoning ability of language models. Our code is available at https://github.com/sail-sg/variational-reasoning.

Comments

In Channel

LongLive: Real-time Interactive Long Video Generation

2025-09-3024:54

Quantile Advantage Estimation for Entropy-Safe Reasoning

2025-09-3023:16

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

2025-09-3027:27

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

2025-09-3025:00

ReviewScore: Misinformed Peer Review Detection with Large Language Models

2025-09-3021:58

Variational Reasoning for Language Models

2025-09-3022:33

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

2025-09-3023:30

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

2025-09-3025:33

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

2025-09-3023:54

No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

2025-09-3027:53

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

2025-09-2722:17

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

2025-09-2723:35

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

2025-09-2728:47

Tree Search for LLM Agent Reinforcement Learning

2025-09-2724:50

Seedream 4.0: Toward Next-generation Multimodal Image Generation

2025-09-2721:30

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

2025-09-2725:09

AutoIntent: AutoML for Text Classification

2025-09-2722:28

Video models are zero-shot learners and reasoners

2025-09-2624:55

SIM-CoT: Supervised Implicit Chain-of-Thought

2025-09-2624:06

Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR

2025-09-2520:21

00:00

Variational Reasoning for Language Models

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-175925281989542{-webkit-line-clamp:2;}Variational Reasoning for Language Models

Variational Reasoning for Language Models

Jingwen Liang, Gengyu Wang

Variational Reasoning for Language Models