NeurIPS 2025: Large Language Diffusion Models

Update: 2025-11-29

Description

This research paper introduces LLaDA, an 8-billion parameter language model based on the masked diffusion model (MDM) architecture, specifically developed to challenge the assumption that core Large Language Model (LLM) capabilities are exclusive to autoregressive models (ARMs). Unlike ARMs that predict the next token sequentially, LLaDA employs a generative approach featuring a forward token-masking process and a reverse process that simultaneously predicts masked tokens using a Transformer network. Trained and evaluated from scratch, LLaDA demonstrates strong scalability and achieves performance comparable to advanced ARM baselines like LLaMA 3 8B across various benchmarks covering general knowledge, math, and code generation. Crucially, the non-autoregressive nature enables bidirectional modeling, which allows LLaDA to effectively address the reversal curse and outperform contemporary models, including GPT-4o, on complex reversal reasoning tasks. These findings confirm that fundamental generative modeling principles, rather than dependence on sequential ARMs, underpin essential LLM capabilities. The work concludes that diffusion models offer a promising new paradigm for building robust, large-scale language models.

Source:

https://openreview.net/pdf?id=KnqiC0znVF

Comments

In Channel

PageANN: Scalable Disk ANNS with Page-Aligned Graphs

2025-12-0713:56

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values

2025-12-0414:44

NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

2025-11-2914:43

NeurIPS 2025: Large Language Diffusion Models

2025-11-2912:39

NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example

2025-11-2913:07

NeurIPS 2025: Parallel Scaling Law for Language Models

2025-11-2916:16

NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

2025-11-2912:45

NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces

2025-11-2915:24

NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

2025-11-2913:38

NeurIPS 2025: Self-Adapting Language Models

2025-11-2911:57

NeurIPS 2025: Thinkless: LLM Learns When to Think

2025-11-2913:48

NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias

2025-11-2914:11

NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents

2025-11-2911:03

NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs

2025-11-2917:04

NeurIPS 2025: Reward Reasoning Model

2025-11-2917:32

Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign

2025-11-2713:17

Anthropic: reward hacking & misalignment & sabotage

2025-11-2215:17

DeepSeek-OCR: Contexts Optical Compression

2025-11-2215:08

Neuromorphic computing: Brain-Inspired AI and Hardware

2025-11-2214:50

Meta: SAM 3

2025-11-2014:22

00:00

1.0x

NeurIPS 2025: Large Language Diffusion Models

#box-pro-ellipsis-176519235653394{-webkit-line-clamp:2;}NeurIPS 2025: Large Language Diffusion Models

NeurIPS 2025: Large Language Diffusion Models

mcgrof

NeurIPS 2025: Large Language Diffusion Models