DiscoverAI: post transformersNeurIPS 2025: Thinkless: LLM Learns When to Think
NeurIPS 2025: Thinkless: LLM Learns When to Think

NeurIPS 2025: Thinkless: LLM Learns When to Think

Update: 2025-11-29
Share

Description

The research introduces Thinkless, a framework designed to solve the computational inefficiency of Large Language Models (LLMs) that overuse chain-of-thought reasoning for simple queries. This adaptive model determines whether to utilize a concise () or detailed reasoning () mode based on the input complexity and its own capabilities. Central to this approach is the Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which employs reinforcement learning to jointly optimize both the selection of the reasoning mode and the accuracy of the final answer. DeGRPO stabilizes training by balancing the gradient signals between the control tokens and the response tokens, successfully preventing policy collapse observed in traditional reinforcement learning methods. Empirically, the model effectively handles varied tasks, demonstrating its ability to reduce the reliance on computationally expensive, long-form reasoning by 50% to 90% on mathematical benchmarks while maintaining performance.


Source:

https://openreview.net/pdf?id=ariVQf0KZx

Comments 
loading
In Channel
Meta: SAM 3

Meta: SAM 3

2025-11-2014:22

loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

NeurIPS 2025: Thinkless: LLM Learns When to Think

NeurIPS 2025: Thinkless: LLM Learns When to Think

mcgrof