Ben Sorscher: Data Pruning for Efficient Machine Learning

Update: 2023-03-02

Description

In this episode, Ben Sorscher, a PhD student at Stanford, sheds light on the challenges posed by the ever-increasing size of data sets used to train machine learning models, specifically large language models. The sheer size of these data sets has been pushing the limits of scaling, as the cost of training and the environmental impact of the electricity they consume becomes increasingly enormous.

As a solution, Ben discusses the concept of "data pruning" - a method of reducing the size of data sets without sacrificing model performance. Data pruning involves selecting the most important or representative data points and removing the rest, resulting in a smaller, more efficient data set that still produces accurate results.

Throughout the podcast, Ben delves into the intricacies of data pruning, including the benefits and drawbacks of the technique, the practical considerations for implementing it in machine learning models, and the potential impact it could have on the field of artificial intelligence.

Craig Smith Twitter: https://twitter.com/craigss
Eye on A.I. Twitter: https://twitter.com/EyeOn_AI

Comments

In Channel

#125 Pascal Weinberger: Harnessing the Power of Generative AI for Creativity & Productivity

2023-06-1557:09

#282 Chris O'Neill: How GrowthLoop is Using Agentic AI for Real-Time, Personalized Marketing

2025-09-0253:13

Danny Tobey: At the Intersection of Law and Artificial Intelligence

2023-04-2855:42

Yoshua Bengio: Pausing More Powerful AI Models and His Work on World Models

2023-04-1340:57

Edo Liberty: Solving ChatGPT Hallucinations With Vector Embeddings

2023-03-3035:26

Ilya Sutskever: The Mastermind Behind GPT-4 and the Future of AI

2023-03-1543:00

Ben Sorscher: Data Pruning for Efficient Machine Learning

2023-03-0234:17

Yann LeCun: Filling the Gap in Large Language Models

2023-02-1655:29

Terry Sejnowski: NeurIPS and the Future of AI

2023-02-0137:05

Geoffrey Hinton: Unpacking The Forward-Forward Algorithm

2023-01-1958:44

Setting the stage for 2023

2023-01-0201:00:21

AI Supply Chain Optimization

2022-11-0940:08

NO-CODE WITH AKKIO

2022-10-2045:30

MLOps with ClearML

2022-10-0542:50

Amazon's Sagemaker

2022-09-2119:37

AUTOMATED CODE GENERATION

2022-09-0829:08

Michael Kearns on Privacy

2022-08-2530:30

XPRIZE TELEPORTATION

2022-08-1036:59

VITAL & MINT

2022-07-2841:08

Amazon's Rohit Prasad

2022-07-1532:08

00:00

Ben Sorscher: Data Pruning for Efficient Machine Learning

#box-pro-ellipsis-176711004371519{-webkit-line-clamp:2;}Ben Sorscher: Data Pruning for Efficient Machine Learning

Ben Sorscher: Data Pruning for Efficient Machine Learning

Craig Smith

Ben Sorscher: Data Pruning for Efficient Machine Learning