DiscoverDaily Paper CastVariational Reasoning for Language Models
Variational Reasoning for Language Models

Variational Reasoning for Language Models

Update: 2025-09-30
Share

Description

🤗 Upvotes: 51 | cs.CL, cs.AI, cs.LG



Authors:

Xiangxin Zhou, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, Tianyu Pang



Title:

Variational Reasoning for Language Models



Arxiv:

http://arxiv.org/abs/2509.22637v1



Abstract:

We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-trace objective for tighter bounds and propose a forward-KL formulation that stabilizes the training of the variational posterior. We further show that rejection sampling finetuning and binary-reward RL, including GRPO, can be interpreted as local forward-KL objectives, where an implicit weighting by model accuracy naturally arises from the derivation and reveals a previously unnoticed bias toward easier questions. We empirically validate our method on the Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks. Overall, our work provides a principled probabilistic perspective that unifies variational inference with RL-style methods and yields stable objectives for improving the reasoning ability of language models. Our code is available at https://github.com/sail-sg/variational-reasoning.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Variational Reasoning for Language Models

Variational Reasoning for Language Models

Jingwen Liang, Gengyu Wang