Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Update: 2024-12-26

Description

🤗 Upvotes: 20 | cs.AI, cs.CL

Authors:

Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xue Kai Zhu, Bowen Zhou

Title:

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Arxiv:

http://arxiv.org/abs/2412.17739v1

Abstract:

Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend. While existing works mainly address RoPE's limitations within attention mechanism, this paper provides an analysis across nearly all parts of LMs, uncovering their adverse effects on length generalization for RoPE-based attention. Using Discrete Signal Processing theory, we show that RoPE enables periodic attention by implicitly achieving Non-Uniform Discrete Fourier Transform. However, this periodicity is undermined by the spectral damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we propose Fourier Position Embedding (FoPE), which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization. FoPE constructs Fourier Series and zero-outs the destructive frequency components, increasing model robustness against the spectrum damage. Experiments across various model scales show that, within varying context windows, FoPE can maintain a more stable perplexity and a more consistent accuracy in a needle-in-haystack task compared to RoPE and ALiBi. Several analyses and ablations bring further support to our method and theoretical modeling.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

2025-01-0322:38

Xmodel-2 Technical Report

2025-01-0317:16

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

2025-01-0324:50

HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

2025-01-0320:48

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

2025-01-0322:06

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

2025-01-0220:07

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

2025-01-0218:53

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

2025-01-0125:04

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

2025-01-0122:45

Bringing Objects to Life: 4D generation from 3D objects

2025-01-0121:48

Efficiently Serving LLM Reasoning Programs with Certaindex

2025-01-0120:19

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

2025-01-0121:15

Edicho: Consistent Image Editing in the Wild

2025-01-0122:47

Facilitating large language model Russian adaptation with Learned Embedding Propagation

2025-01-0122:12

Training Software Engineering Agents and Verifiers with SWE-Gym

2025-01-0126:54

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

2025-01-0120:54

Slow Perception: Let's Perceive Geometric Figures Step-by-step

2025-01-0123:19

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

2024-12-3123:19

1.58-bit FLUX

2024-12-3122:59

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

2024-12-3117:30

00:00

1.0x

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-173593393169928{-webkit-line-clamp:2;}Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Jingwen Liang, Gengyu Wang

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization