Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Update: 2025-07-29

Description

Join us for an insightful exploration into the world of Reasoning LLMs, drawing on the expertise of Sebastian Raschka, PhD. This episode demystifies how Large Language Models (LLMs) are being refined to excel at complex tasks that require intermediate steps, such as solving puzzles, advanced mathematics, and challenging coding problems, moving beyond simple factual question-answering.

We'll uncover the four main approaches currently used to build and improve these specialised reasoning capabilities:

Inference-time scaling: Discover how techniques like Chain-of-Thought (CoT) prompting encourage LLMs to generate intermediate reasoning steps, mimicking a 'thought process' and often leading to more accurate results on more complex problems. This approach increases computational resources during inference, making it more expensive.
Pure Reinforcement Learning (RL): Learn about the surprising emergence of reasoning behaviour from pure reinforcement learning, as demonstrated by DeepSeek-R1-Zero. This model was trained exclusively with RL, without an initial supervised fine-tuning (SFT) stage, using accuracy and format rewards to develop basic reasoning skills.
Supervised Fine-tuning (SFT) + Reinforcement Learning (RL): Understand this key approach for building high-performance reasoning models, exemplified by DeepSeek's flagship R1 model. This method refines models with additional SFT stages and further RL training, building upon "cold-started" pure RL models.
Pure SFT and Distillation: Explore how smaller, more efficient reasoning models can be created by instruction fine-tuning them on high-quality SFT data generated by larger, stronger LLMs. This approach is particularly attractive for creating models that are cheaper to run and can operate on lower-end hardware.

We'll also discuss when to use reasoning models – they are ideal for complex challenges but can be inefficient, more verbose, and expensive for simpler tasks, sometimes even being "prone to errors due to 'overthinking'". The episode provides valuable insights from the DeepSeek R1 pipeline as a detailed case study and touches upon comparisons with models like OpenAI's o1. Plus, get tips for developing reasoning models on a limited budget, including the promise of distillation and innovative methods like 'journey learning', which includes incorrect solution paths to teach models from mistakes. Tune in to navigate the rapidly evolving landscape of reasoning LLMs!

Comments

In Channel

The Architecture of AI Transformation: Scaling Collaborative Intelligence and Governance with Enterprise Architecture

2025-10-2942:13

AI + SaaS: The New Software Supercycle

2025-10-1631:00

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

2025-07-2933:43

LLM Unpacked: A Deep Dive into Modern AI Architectures

2025-07-2941:41

AI and Enterprise Architecture: Orchestrating Business Transformation

2025-07-2139:24

The State of Enterprise Architecture 2025

2025-07-1813:56

ERP Software Statistics 2025 By New Enhanced Technology

2025-07-1718:19

TOGAF Business Architecture Foundation Practice Exam Questions

2025-07-0620:01

SAP Integrated Toolchain for Enterprise Architects

2025-06-2514:52

TOGAF 10 – Practice Questions with Answers and Explanations - Part 4

2025-05-0619:42

TOGAF 10 – Practice Questions with Answers and Explanations - Part 3

2025-05-0626:25

TOGAF 10 – Practice Questions with Answers and Explanations - Part 2

2025-05-0621:25

TOGAF 10 Certification Study Guide - Part 1

2025-05-0621:50

Building Autonomous Agents: A Practical Guide by OpenIA

2025-04-2321:12

Google Workspace Prompting Guide

2025-04-2212:10

Mastering Transformation: A CTO's Playbook for Defining a Clear Strategy

2025-03-2123:50

Meetings That Work: Staying on Track Without the Tangent Trap

2025-03-2113:05

The Evolution and Impact of Industry Standards: The Case of APQC

2025-03-2115:38

CIOs' 2024 Transformation Report: Navigating ERP, AI, and Business Innovation

2025-03-1819:21

Limitations of Transformers & LLMs

2025-03-0526:06

00:00

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

#box-pro-ellipsis-176339185343515{-webkit-line-clamp:2;}Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Ali Mehedi

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies