Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Update: 2025-03-24

Description

This podcast episode delves into the "Transformers without Normalization" paper, which introduces Dynamic Tanh (DyT) as a potential replacement for normalization layers in Transformers. DyT, a simple operation defined as tanh(αx) with a learnable parameter, aims to replicate the effects of Layer Norm without calculating activation statistics. Could DyT offer similar or better performance and improved efficiency, challenging the necessity of normalization in modern neural networks?

Comments

In Channel

Google - 5 days: Prototype to Production

2025-12-1915:01

Google - 5 days: Agent Quality

2025-12-1817:28

Google - 5 days: Context Engineering: Sessions & Memory

2025-12-1712:58

Google - 5 days: Agent Tools

2025-12-1614:51

Google 5 days: Introduction to Agent

2025-12-1515:31

DeepSeek-R1: Reasoning via Reinforcement LearningDeepSeek-R1: Reasoning via Reinforcement Learning

2025-03-0415:59

Google Cloud AI Business Trends 2025

2025-03-0424:12

LLM Post-Training: Reasoning, Reinforcement Learning, and Scaling

2025-03-0438:07

AI agent trends 2026 - Google

2025-12-3017:21

Building reliable AI Agent with domain memory

2025-12-2915:37

METR's Benchmarks vs Economics: The AI capability measurement gap

2025-12-2814:34

Adaptation of Agentic AI

2025-12-2615:16

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

2025-12-2512:20

Career Advice in AI

2025-12-2214:29

Leadership in AI Assisted Engineering

2025-12-2112:43

AI Consulting in Practice

2025-12-1915:58

The Gemini Interactions API

2025-12-1613:02

The Adoption and Usage of AI Agents: Early Evidence from Perplexity

2025-12-1315:39

Monetizing AI: Pricing Strategies and Experimentation

2025-12-1016:23

The 2026 State of AI Agents in Production - report from Anthropic

2025-12-1014:04

00:00

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

#box-pro-ellipsis-176721748384340{-webkit-line-clamp:2;}Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Build Wiz AI

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance