120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models

Update: 2022-04-20

Description

AI scaling has really taken off. Ever since GPT-3 came out, it’s become clear that one of the things we’ll need to do to move beyond narrow AI and towards more generally intelligent systems is going to be to massively scale up the size of our models, the amount of processing power they consume and the amount of data they’re trained on, all at the same time.

That’s led to a huge wave of highly scaled models that are incredibly expensive to train, largely because of their enormous compute budgets. But what if there was a more flexible way to scale AI — one that allowed us to decouple model size from compute budgets, so that we can track a more compute-efficient course to scale?

That’s the promise of so-called mixture of experts models, or MoEs. Unlike more traditional transformers, MoEs don’t update all of their parameters on every training pass. Instead, they route inputs intelligently to sub-models called experts, which can each specialize in different tasks. On a given training pass, only those experts have their parameters updated. The result is a sparse model, a more compute-efficient training process, and a new potential path to scale.

Google has been pushing the frontier of research on MoEs, and my two guests today in particular have been involved in pioneering work on that strategy (among many others!). Liam Fedus and Barrett Zoph are research scientists at Google Brain, and they joined me to talk about AI scaling, sparsity and the present and future of MoE models on this episode of the TDS podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

2:15 Guests’ backgrounds

8:00 Understanding specialization

13:45 Speculations for the future

21:45 Switch transformer versus dense net

27:30 More interpretable models

33:30 Assumptions and biology

39:15 Wrap-up

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

130. Edouard Harris - New Research: Advanced AI may tend to seek power *by default*

2022-10-1258:22

129. Amber Teng - Building apps with a new generation of language models

2022-10-0551:21

128. David Hirko - AI observability and data as a cybersecurity weakness

2022-09-2849:02

127. Matthew Stewart - The emerging world of ML sensors

2022-09-2141:34

126. JR King - Does the brain run on deep learning?

2022-09-1455:43

125. Ryan Fedasiuk - Can the U.S. and China collaborate on AI safety?

2022-09-0748:19

124. Alex Watson - Synthetic data could change everything

2022-05-1851:47

123. Ala Shaabana and Jacob Steeves - AI on the blockchain (it actually might just make sense)

2022-05-1254:43

122. Sadie St. Lawrence - Trends in data science

2022-05-0443:02

121. Alexei Baevski - data2vec and the future of multimodal learning

2022-04-2749:31

120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models

2022-04-2040:47

119. Jaime Sevilla - Projecting AI progress from compute trends

2022-04-1348:34

118. Angela Fan - Generating Wikipedia articles with AI

2022-04-0651:44

117. Beena Ammanath - Defining trustworthy AI

2022-03-3046:46

116. Katya Sedova - AI-powered disinformation, present and future

2022-03-2354:24

115. Irina Rish - Out-of-distribution generalization

2022-03-0950:12

114. Sam Bowman - Are we *under-hyping* AI?

2022-03-0247:48

113. Yaron Singer - Catching edge cases in AI

2022-02-0935:20

112. Tali Raveh - AI, single cell genomics, and the new era of computational biology

2022-02-0242:04

111. Mo Gawdat - Scary Smart: A former Google exec’s perspective on AI risk

2022-01-2601:00:12

00:00

120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models

#box-pro-ellipsis-173236296947120{-webkit-line-clamp:2;}120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models

Chapters:

120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models

The TDS team

120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models