Training a 1 trillion parameter model
Update: 2025-09-04
Share
Description
Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism
Comments
In Channel
Description
Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism