YaRN: Extending LLM Context Windows Efficiently

Update: 2025-09-10

Description

This academic paper introduces YaRN (Yet another RoPE extensioN method), a novel and efficient technique for extending the context window of large language models (LLMs) that utilize Rotary Position Embeddings (RoPE). The authors demonstrate that YaRN significantly reduces the computational resources needed for this extension, requiring substantially fewer tokens and training steps compared to previous methods like Position Interpolation (PI) and NTK-aware interpolation. Through various experiments, including long sequence language modeling, passkey retrieval, and standardized benchmarks, the paper shows that YaRN-fine-tuned models, such as those based on LLaMA and Mistral architectures, can effectively extrapolate to context lengths much longer than their original training while maintaining or surpassing the performance of existing context extension techniques and preserving original model capabilities. The research highlights YaRN's efficiency, strong generalization capabilities, and potential for transfer learning in resource-constrained environments.

Comments

In Channel

Kimi Founder Yang Zhilin on K2, Agentic LLMs, & AGI: The Beginning of Infinity | Scaling & Innovation Strategy

2025-11-3020:00

Ilya Sutskever on AI: Transitioning from Scaling to Research, Generalization, and the Future of Superintelligence

2025-11-2634:59

Neuromorphic Computing: Principles and Architecture

2025-11-2311:57

Gemini 3 Pro Release Review: Benchmarks, Generative UI, Deep Think Mode, and Google Antigravity

2025-11-2017:10

DeepSeek-OCR: Contexts Optical Compression

2025-11-1614:00

LLM Gambling Addiction: Behavioral and Neural Mechanisms

2025-11-1016:32

Glyph: Visual-Text Compression for Scaling Context Windows

2025-11-0215:58

Continual Learning via Sparse Memory Finetuning

2025-10-2614:07

Andrej Karpathy on AI, Intelligence, and Education

2025-10-2136:19

Untangling the xAI-OpenAI Legal War: Trade Secrets and Antitrust

2025-10-0418:09

IBM Granite 4.0: Hybrid Mamba/Transformer Breakthrough for Enterprise LLMs?

2025-10-0314:03

Anthropic's Claude Sonnet 4.5: The New Coding Standard?

2025-09-3016:08

GPT-5-Codex: Agentic Coding and OpenAI's Evolution

2025-09-2213:40

Grok 4 Fast: Speed, Efficiency, and Application Review

2025-09-2214:52

How to Read a Research Paper

2025-09-1407:15

The Science of Sampling

2025-09-1406:58

GPT-5 Revisited: Progress, Performance, and User Experience

2025-09-1213:49

Thyme Autonomous AI that Sees, Codes and Solves Problems

2025-09-1141:04

YaRN: Extending LLM Context Windows Efficiently

2025-09-1006:27

Ilya Sutskever's AI Vision: From Deep Learning Dogmas to Safe Superintelligence

2025-09-0949:45

00:00

YaRN: Extending LLM Context Windows Efficiently

#box-pro-ellipsis-176454740608853{-webkit-line-clamp:2;}YaRN: Extending LLM Context Windows Efficiently

YaRN: Extending LLM Context Windows Efficiently

Neuralintel.org

YaRN: Extending LLM Context Windows Efficiently