Listen Top Shows Blog

End-to-End Test-Time Training for Long Context

End-to-End Test-Time Training for Long Context

Update: 2026-01-03

Share

Description

This research introduces TTT-E2E, a novel method for long-context language modeling that treats the task as a continual learning challenge rather than an architectural redesign. Unlike standard Transformers that struggle with the high computational cost of processing vast amounts of data, this model **compresses context into its weights** by learning at test time via next-token prediction. By integrating **meta-learning during training**, the system is optimized to initialize effectively for these **test-time updates**, ensuring the model improves as it reads more information. The authors demonstrate that while traditional RNNs and hybrid models lose effectiveness in very long contexts, **TTT-E2E scales performance** similarly to full-attention Transformers while maintaining the **constant inference speed** of an RNN. Ultimately, the method achieves significant efficiency gains, running **2.7 times faster** than standard models at a 128K context length while achieving superior language modeling accuracy.

Comments

In Channel

Adapting fast and slow: transportable circuits for few shot learning

Adapting fast and slow: transportable circuits for few shot learning

2026-01-0415:25

Position: Probabilistic Modelling is Sufficient for Causal Inference

Position: Probabilistic Modelling is Sufficient for Causal Inference

2026-01-0312:27

End-to-End Test-Time Training for Long Context

End-to-End Test-Time Training for Long Context

2026-01-0313:52

Parallel Token Generation for Language Models

Parallel Token Generation for Language Models

2026-01-0215:39

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

2025-12-3115:59

Activation oracles: training and evaluating llms as general-purpose activation explainers

Activation oracles: training and evaluating llms as general-purpose activation explainers

2025-12-3015:18

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

2025-12-2913:41

Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction

Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction

2025-12-2914:17

Monitoring Monitorability/ OpenAI

Monitoring Monitorability/ OpenAI

2025-12-2814:03

Detailed Balance in Large Language Model-Driven Agents

Detailed Balance in Large Language Model-Driven Agents

2025-12-2811:49

Learning to reason in LLMs by expectation maximization

Learning to reason in LLMs by expectation maximization

2025-12-2813:53

Exploratory Causal Inference in SAEnce

Exploratory Causal Inference in SAEnce

2025-12-2515:13

Detailed balance in large language model-driven agents

Detailed balance in large language model-driven agents

2025-12-2411:49

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

2025-12-2416:11

Adaptation of Agentic AI

Adaptation of Agentic AI

2025-12-2313:20

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

2025-12-2210:30

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

2025-12-2113:45

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

2025-12-2014:30

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

2025-12-1916:14

Bolmo: Byteifying the Next Generation of Language Models

Bolmo: Byteifying the Next Generation of Language Models

2025-12-1913:13

00:00

00:00

x

End-to-End Test-Time Training for Long Context

End-to-End Test-Time Training for Long Context

Enoch H. Kang