DiscoverBuild Wiz AI ShowTransformers Without Normalization: Dynamic Tanh Achieves Strong Performance
Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Update: 2025-03-24
Share

Description

This podcast episode delves into the "Transformers without Normalization" paper, which introduces Dynamic Tanh (DyT) as a potential replacement for normalization layers in Transformers. DyT, a simple operation defined as tanh(αx) with a learnable parameter, aims to replicate the effects of Layer Norm without calculating activation statistics. Could DyT offer similar or better performance and improved efficiency, challenging the necessity of normalization in modern neural networks?

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Build Wiz AI