AI Snake Oil—A New Book by 2 Princeton University Computer Scientists
Description
Arvind Narayanan and Sayash Kapoor are well regarded computer scientists at Princeton University and have just published a book with a provocative title, AI Snake Oil. Here I’ve interviewed Sayash and challenged him on this dismal title, for which he provides solid examples of predictive AI’s failures. Then we get into the promise of generative AI.
Full videos of all Ground Truths podcasts can be seen on YouTube here. The audios are also available on Apple and Spotify.
Transcript with links to audio and external links to key publications
Eric Topol (00:06 ):
Hello, it's Eric Topol with Ground Truths, and I'm delighted to welcome the co-author of a new book AI SNAKE OIL and it's Sayash Kapoor who has written this book with Arvind Narayanan of Princeton. And so welcome, Sayash. It's wonderful to have you on Ground Truths.
Sayash Kapoor (00:28 ):
Thank you so much. It's a pleasure to be here.
Eric Topol (00:31 ):
Well, congratulations on this book. What's interesting is how much you've achieved at such a young age. Here you are named in TIME100 AI’s inaugural edition as one of those eminent contributors to the field. And you're currently a PhD candidate at Princeton, is that right?
Sayash Kapoor (00:54 ):
That's correct, yes. I work at the Center for Information Technology Policy, which is a joint program between the computer science department and the school of public and international affairs.
Eric Topol (01:05 ):
So before you started working on your PhD in computer science, you already were doing this stuff, I guess, right?
Sayash Kapoor (01:14 ):
That's right. So before I started my PhD, I used to work at Facebook as a machine learning engineer.
Eric Topol (01:20 ):
Yeah, well you're taking it to a more formal level here. Before I get into the book itself, what was the background? I mean you did describe it in the book why you decided to write a book, especially one that was entitled AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference.
Background to Writing the Book
Sayash Kapoor (01:44 ):
Yeah, absolutely. So I think for the longest time both Arvind and I had been sort of looking at how AI works and how it doesn't work, what are cases where people are somewhat fooled by the potential for this technology and fail to apply it in meaningful ways in their life. As an engineer at Facebook, I had seen how easy it is to slip up or make mistakes when deploying machine learning and AI tools in the real world. And had also seen that, especially when it comes to research, it's really easy to make mistakes even unknowingly that inflate the accuracy of a machine learning model. So as an example, one of the first research projects I did when I started my PhD was to look at the field of political science in the subfield of civil war prediction. This is a field which tries to predict where the next civil war will happen and in order to better be prepared for civil conflict.
(02:39 ):
And what we found was that there were a number of papers that claimed almost perfect accuracy at predicting when a civil war will take place. At first this seemed sort of astounding. If AI can really help us predict when a civil war will start like years in advance sometimes, it could be game changing, but when we dug in, it turned out that every single one of these claims where people claim that AI was better than two decades old logistic regression models, every single one of these claims was not reproducible. And so, that sort of set the alarm bells ringing for the both of us and we sort of dug in a little bit deeper and we found that this is pervasive. So this was a pervasive issue across fields that were quickly adopting AI and machine learning. We found, I think over 300 papers and the last time I compiled this list, I think it was over 600 papers that suffer from data leakage. That is when you can sort of train on the sets that you're evaluating your models on. It's sort of like teaching to the test. And so, machine learning model seems like it does much better when you evaluate it on your data compared to how it would really work out in the real world.
Eric Topol (03:48 ):
Right. You say in the book, “the goal of this book is to identify AI snake oil - and to distinguish it from AI that can work well if used in the right ways.” Now I have to tell you, it's kind of a downer book if you're an AI enthusiast because there's not a whole lot of positive here. We'll get to that in a minute. But you break down the types of AI, which I'm going to challenge a bit into three discrete areas, the predictive AI, which you take a really harsh stance on, say it will never work. Then there's generative AI, obviously the large language models that took the world by storm, although they were incubating for several years when ChatGPT came along and then content moderation AI. So maybe you could tell us about your breakdown to these three different domains of AI.
Three Types of AI: Predictive, Generative, Content Moderation
Sayash Kapoor (04:49 ):
Absolutely. I think one of our main messages across the book is that when we are talking about AI, often what we are really interested in are deeper questions about society. And so, our breakdown of predictive, generative, and content moderation AI sort of reflects how these tools are being used in the real world today. So for predictive AI, one of the motivations for including this in the book as a separate category was that we found that it often has nothing to do with modern machine learning methods. In some cases it can be as simple as decades old linear regression tools or logistic regression tools. And yet these tools are sold under the package of AI. Advances that are being made in generative AI are sold as if they apply to predictive AI as well. Perhaps as a result, what we are seeing is across dozens of different domains, including insurance, healthcare, education, criminal justice, you name it, companies have been selling predictive AI with the promise that we can use it to replace human decision making.
(05:51 ):
And I think that last part is where a lot of our issues really come down to because these tools are being sold as far more than they're actually capable of. These tools are being sold as if they can enable better decision making for criminal justice. And at the same time, when people have tried to interrogate these tools, what we found is these tools essentially often work no better than random, especially when it comes to some consequential decisions such as job automation. So basically deciding who gets to be called on the next level of like a job interview or who is rejected, right as soon as they submit the CV. And so, these are very, very consequential decisions and we felt like there is a lot of snake oil in part because people don't distinguish between applications that have worked really well or where we have seen tremendous advances such as generative AI and applications where essentially we've stalled for a num