Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Update: 2025-01-02

Description

🤗 Upvotes: 13 | cs.CL

Authors:

Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu

Title:

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Arxiv:

http://arxiv.org/abs/2412.21187v1

Abstract:

The remarkable performance of models like the OpenAI o1 can be attributed to their ability to emulate human-like long-time thinking during inference. These models employ extended chain-of-thought (CoT) processes, exploring multiple strategies to enhance problem-solving capabilities. However, a critical question remains: How to intelligently and efficiently scale computational resources during testing. This paper presents the first comprehensive study on the prevalent issue of overthinking in these models, where excessive computational resources are allocated for simple problems with minimal benefit. We introduce novel efficiency metrics from both outcome and process perspectives to evaluate the rational use of computational resources by o1-like models. Using a self-training paradigm, we propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy. Experimental results show that our approach successfully reduces computational overhead while preserving model performance across a range of testsets with varying difficulty levels, such as GSM8K, MATH500, GPQA, and AIME.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

2025-01-0423:53

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

2025-01-0423:32

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

2025-01-0419:15

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

2025-01-0424:49

ProgCo: Program Helps Self-Correction of Large Language Models

2025-01-0420:19

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

2025-01-0425:32

A3: Android Agent Arena for Mobile GUI Agents

2025-01-0423:35

MLLM-as-a-Judge for Image Safety without Human Labeling

2025-01-0422:20

Dynamic Scaling of Unit Tests for Code Reward Modeling

2025-01-0421:52

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

2025-01-0322:38

Xmodel-2 Technical Report

2025-01-0317:16

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

2025-01-0324:50

HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

2025-01-0320:48

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

2025-01-0322:06

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

2025-01-0220:07

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

2025-01-0218:53

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

2025-01-0125:04

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

2025-01-0122:45

Bringing Objects to Life: 4D generation from 3D objects

2025-01-0121:48

Efficiently Serving LLM Reasoning Programs with Certaindex

2025-01-0120:19

00:00

1.0x

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-173609305924762{-webkit-line-clamp:2;}Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Jingwen Liang, Gengyu Wang

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs