#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Update: 2024-04-23
Description
CUDA で書かれた PyTorch 用カーネルに森田が玉砕しました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。
<figure class="wp-block-audio"></figure>
- [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- GitHub – Dao-AILab/flash-attention: Fast and memory-efficient exact attention
- GitHub – NVIDIA/apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
- [2307.08691] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
- [2112.05682] Self-attention Does Not Need $O(n^2)$ Memory
- GitHub – tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only)
<iframe src="https://docs.google.com/forms/d/e/1FAIpQLSdBvbhI98yeJQV_QWBsl1Q5vY7iohwFN-lJOY2fIh_pfjwRSQ/viewform?embedded=true" frameborder="0" width="100%" height="800" marginheight="0" marginwidth="0" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
In Channel