Mamba

Update: 2025-07-10

Description

Mamba is a novel deep learning architecture that achieves linear scaling in computation and memory with sequence length, addressing Transformers' quadratic limitations. Its selective State Space Model (SSM) layer dynamically adapts to input context, allowing it to "forget" or "remember" information. Optimizations include a hardware-aware parallel algorithm for its recurrent "selective scan", employing kernel fusion for efficient GPU memory usage and recomputation to reduce memory footprint during training. This results in significantly faster inference (up to 5x throughput) and superior long-context handling.

Comments

In Channel

Kimi K2

2025-07-2215:30

Mixture-of-Recursions (MoR)

2025-07-1816:43

MeanFlow

2025-07-1006:47

Mamba

2025-07-1008:14

LLM Alignment

2025-06-1420:06

Why We Think

2025-05-2014:20

Deep Research

2025-05-1211:35

vLLM

2025-05-0413:06

Qwen3: Thinking Deeper, Acting Faster

2025-05-0413:15

RAGEN: train and evaluate LLM agents using multi-turn RL

2025-05-0311:56

DeepSeek-Prover-V2

2025-05-0111:04

DeepSeek-Prover

2025-05-0108:37

Model Context Protocol (MCP)

2025-04-0913:36

LLM Post-Training: Reasoning

2025-03-1722:18

Agent AI Overview

2025-03-1721:06

FlashAttention-3

2025-03-0713:43

FlashAttention-2

2025-03-0510:50

FlashAttention

2025-03-0510:55

PPO (Proximal Policy Optimization)

2025-02-1513:42

"Deep Dive into LLMs like ChatGPT" - Andrej Karpathy's Tech Talk Learning

2025-02-1518:10

00:00

#box-pro-ellipsis-176797323529213{-webkit-line-clamp:2;}Mamba

Mamba

AI-Talk

Mamba