#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Update: 2024-04-23

Description

CUDA で書かれた PyTorch 用カーネルに森田が玉砕しました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。

</figure>

[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

GitHub – Dao-AILab/flash-attention: Fast and memory-efficient exact attention

GitHub – NVIDIA/apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

[2307.08691] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

[2112.05682] Self-attention Does Not Need $O(n^2)$ Memory

GitHub – tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only)

Comments

In Channel

#143 – SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

2024-12-1136:07

#142: An Empirical Study of Rust-for-Linux: The Success, Dissatisfaction, and Compromise

2024-12-0454:05

#141: SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

2024-11-1336:55

#140: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

2024-10-2439:54

#139: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

2024-10-0237:59

#138: Distilling the Knowledge in a Neural Network

2024-09-1123:38

#137: Optimal Quantile Approximation in Streams

2024-08-1327:31

#136: Distinct Elements in Streams: An Algorithm for the (Text) Book

2024-08-0725:58

#135: In-Datacenter Performance Analysis of a Tensor Processing Unit

2024-07-0244:18

#134: LoRA: Low-Rank Adaptation of Large Language Models

2024-06-0928:40

#133: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

2024-05-2130:18

#132: High-Resolution Image Synthesis with Latent Diffusion Models

2024-05-0234:30

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2024-04-2330:40

#130: Diffusion models from scratch, from a new theoretical perspective

2024-04-0530:44

#129: Programming Massively Parallel Processors (Ch.4- Ch.6)

2024-03-2659:43

#128: Faiss: A library for efficient similarity search and clustering of dense vectors.

2024-03-0942:38

#127: Programming Massively Parallel Processors (Ch.1- Ch.3)

2024-02-2928:52

#126: Vector Database Management Systems

2024-01-3054:03

#125: Always-on Vision Processing Unit for Mobile Applications

2024-01-2327:46

#124: GAIA: a benchmark for General AI Assistants

2023-12-2241:33

00:00

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

#box-pro-ellipsis-176456271075395{-webkit-line-clamp:2;}#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Hajime Morrita

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness