Listen Top Shows Blog

Task Descriptors Help Transformers Learn Linear Models In-Context

Task Descriptors Help Transformers Learn Linear Models In-Context

Update: 2026-03-07

Share

Description

This paper explores how task descriptors, such as a mean value $\mu$, improve in-context learning (ICL) for linear regression within Transformer models. By examining a one-layer linear self-attention (LSA) network, the researchers demonstrate that models can effectively utilize these descriptors to standardize input data and reduce prediction errors. The paper provides a mathematical proof that gradient flow training converges to a global minimum, allowing the Transformer to simulate an optimized version of gradient descent. Through various experiments, the authors confirm that adding task information leads to superior performance compared to models without such context. Furthermore, the study reveals that while large sample sizes simplify the model's strategy, finite sample settings require the Transformer to develop more complex internal representations to manage bias and variance. These findings provide a theoretical foundation for the empirical success of prompts and instructions in large language models.

Comments

In Channel

HyperAgents: : Open-Ended Metacognitive Self-Improvement for Any Computable Task

HyperAgents: : Open-Ended Metacognitive Self-Improvement for Any Computable Task

2026-03-2721:48

$Harness design for long-running application development \ Anthropic$

Harness design for long-running application development \ Anthropic

2026-03-2621:23

Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably

Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably

2026-03-2420:23

How Log-Barrier Helps Exploration in Policy Optimization

How Log-Barrier Helps Exploration in Policy Optimization

2026-03-2221:05

The Finetuner’s Fallacy: When to Pretrain with Your Finetuning Data

The Finetuner’s Fallacy: When to Pretrain with Your Finetuning Data

2026-03-2218:24

TURNWISE: The Gap between Single- and Multi-turn Language Model Capabilities

TURNWISE: The Gap between Single- and Multi-turn Language Model Capabilities

2026-03-2211:15

Temporal Straightening for Latent Planning

Temporal Straightening for Latent Planning

2026-03-2021:19

Fine-Tuning Strategies for Preserving In-Context Learning in Linear Attention

Fine-Tuning Strategies for Preserving In-Context Learning in Linear Attention

2026-03-1918:53

LLMs Can Learn to Reason Via Off-Policy RL

LLMs Can Learn to Reason Via Off-Policy RL

2026-03-1919:48

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

2026-03-1723:42

Provable and practical in-context policy optimization for self-improvement

Provable and practical in-context policy optimization for self-improvement

2026-03-1721:27

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

2026-03-1623:25

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

2026-03-1420:25

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

2026-03-1420:13

∇−reasoner: LLM reasoning via test-time gradient descent in latent space

∇−reasoner: LLM reasoning via test-time gradient descent in latent space

2026-03-1421:16

Inference for Regression with Variables Generated by AI or Machine Learning

Inference for Regression with Variables Generated by AI or Machine Learning

2026-03-1221:55

Fast KV Compaction via Attention Matching

Fast KV Compaction via Attention Matching

2026-03-1223:27

Position: stop anthropomorphizing intermediate tokens as reasoning/thinking traces!

Position: stop anthropomorphizing intermediate tokens as reasoning/thinking traces!

2026-03-1118:42

Code World Models for General Game Playing

Code World Models for General Game Playing

2026-03-0821:42

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

2026-03-0717:00

00:00

00:00

x

Task Descriptors Help Transformers Learn Linear Models In-Context

Task Descriptors Help Transformers Learn Linear Models In-Context

Enoch H. Kang