DiscoverBest AI papers explainedTask Descriptors Help Transformers Learn Linear Models In-Context
Task Descriptors Help Transformers Learn Linear Models In-Context

Task Descriptors Help Transformers Learn Linear Models In-Context

Update: 2026-03-07
Share

Description

This paper explores how task descriptors, such as a mean value $\mu$, improve in-context learning (ICL) for linear regression within Transformer models. By examining a one-layer linear self-attention (LSA) network, the researchers demonstrate that models can effectively utilize these descriptors to standardize input data and reduce prediction errors. The paper provides a mathematical proof that gradient flow training converges to a global minimum, allowing the Transformer to simulate an optimized version of gradient descent. Through various experiments, the authors confirm that adding task information leads to superior performance compared to models without such context. Furthermore, the study reveals that while large sample sizes simplify the model's strategy, finite sample settings require the Transformer to develop more complex internal representations to manage bias and variance. These findings provide a theoretical foundation for the empirical success of prompts and instructions in large language models.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Task Descriptors Help Transformers Learn Linear Models In-Context

Task Descriptors Help Transformers Learn Linear Models In-Context

Enoch H. Kang