Improve Vision Language Model Chain-of-thought Reasoning

Update: 2024-10-28

Description

🖼 Improve Vision Language Model Chain-of-thought Reasoning

This research paper investigates how to improve the chain-of-thought (CoT) reasoning capabilities of vision language models (VLMs). The authors address the lack of high-quality CoT data for training VLMs and propose two key methods: first, distilling rationales from a powerful language model (GPT-4o) to enrich the training data and fine-tune VLMs, leading to significant improvements in CoT performance. Second, they leverage reinforcement learning (RL) through the Direct Preference Optimization (DPO) algorithm to further calibrate reasoning quality, utilizing positive and negative pairs of model-generated reasoning chains. The authors demonstrate that their approach effectively enhances reasoning capabilities, paving the way for more robust and interpretable multimodal models.

📎 Link to paper

Comments

In Channel

Marco-o1

2024-11-2314:47

Scaling Laws for Precision

2024-11-1818:39

Test-Time Training

2024-11-1414:38

Qwen2.5-Coder

2024-11-1224:03

Attacking Vision-Language Computer Agents via Pop-ups

2024-11-0921:39

Number Cookbook

2024-11-0816:11

Jigsaw Puzzles

2024-11-0716:44

Multi-expert Prompting with LLMs

2024-11-0512:41

Investigating the Role of Prompting and External Tools in Hallucination Rates of LLMs

2024-11-0316:03

Mind Your Step (by Step)

2024-11-0216:44

SimpleQA

2024-10-3117:33

GPT-4o System Card

2024-10-3024:23

Mixture of Parrots

2024-10-2910:51

Improve Vision Language Model Chain-of-thought Reasoning

2024-10-2815:44

Breaking the Memory Barrier

2024-10-2715:33

LLMs Reflect the Ideology of their Creators

2024-10-2611:09

LongRAG

2024-10-2518:07

A Theoretical Understanding of Chain-of-Thought

2024-10-2409:56

A Survey on Data Synthesis and Augmentation for Large Language Models

2024-10-2321:21

Revealing the Barriers of Language Agents in Planning

2024-10-2208:56

00:00

Improve Vision Language Model Chain-of-thought Reasoning

#box-pro-ellipsis-175931516323810{-webkit-line-clamp:2;}Improve Vision Language Model Chain-of-thought Reasoning

Improve Vision Language Model Chain-of-thought Reasoning

Shahriar Shariati

Improve Vision Language Model Chain-of-thought Reasoning