Listen Top Shows Blog

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Update: 2025-12-16

Share

Description

This paper discusses how Retrieval-Augmented Generation (RAG) framework can be designed to overcome the structural issues of separate retrieval and generation modules. The proposed framework, CLaRa, achieves this by employing a **shared latent space** where documents are compressed into concise, continuous memory-token representations, addressing the architectural mismatch and efficiency problems of traditional RAG. Key to CLaRa is its **joint optimization** mechanism, which uses the Next-Token Prediction loss from the generator to provide a weak supervision signal, aligning the retriever with the downstream task objective without requiring explicit relevance labels. The framework uses a diverse dataset of **Simple QA, Complex QA, and Paraphrase pairs** for pretraining, and empirical results show that CLaRa, particularly when initialized from pretraining, achieves **state-of-the-art retrieval performance** that rivals or surpasses fully supervised baselines on various question-answering tasks. Furthermore, analyses confirm that the compressed representations successfully **preserve semantic content** while substantially reducing the context length, significantly improving overall system efficiency.

Comments

In Channel

Parallel Token Generation for Language Models

Parallel Token Generation for Language Models

2026-01-0215:39

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

2025-12-3115:59

Activation oracles: training and evaluating llms as general-purpose activation explainers

Activation oracles: training and evaluating llms as general-purpose activation explainers

2025-12-3015:18

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

2025-12-2913:41

Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction

Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction

2025-12-2914:17

Monitoring Monitorability/ OpenAI

Monitoring Monitorability/ OpenAI

2025-12-2814:03

Detailed Balance in Large Language Model-Driven Agents

Detailed Balance in Large Language Model-Driven Agents

2025-12-2811:49

Learning to reason in LLMs by expectation maximization

Learning to reason in LLMs by expectation maximization

2025-12-2813:53

Exploratory Causal Inference in SAEnce

Exploratory Causal Inference in SAEnce

2025-12-2515:13

Detailed balance in large language model-driven agents

Detailed balance in large language model-driven agents

2025-12-2411:49

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

2025-12-2416:11

Adaptation of Agentic AI

Adaptation of Agentic AI

2025-12-2313:20

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

2025-12-2210:30

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

2025-12-2113:45

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

2025-12-2014:30

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

2025-12-1916:14

Bolmo: Byteifying the Next Generation of Language Models

Bolmo: Byteifying the Next Generation of Language Models

2025-12-1913:13

What happened with sparse autoencoders?

What happened with sparse autoencoders?

2025-12-1730:09

What Matters Right Now in Mechanistic Interpretability

What Matters Right Now in Mechanistic Interpretability

2025-12-1632:30

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

2025-12-1614:45

00:00

00:00

x

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Enoch H. Kang