#267: Regression? It Can be Extraordinary! (OLS FTW. IYKYK.) with Chelsea Parlett-Pelleriti
Description
Why? Or… y? What is y? Why, it’s mx + b! It’s the formula for a line, which is just a hop, a skip, and an error term away from the formula for a linear regression! On the one hand, it couldn’t be simpler. On the other hand, it’s a broad and deep topic. You’ve got your parameters, your feature engineering, your regularization, the risks of flawed assumptions and multicollinearity and overfitting, the distinction between inference and prediction… and that’s just a warm-up! What variables would you expect to be significant in a model aimed at predicting how engaging an episode will be? Presumably, guest quality would top your list! It topped ours, which is why we asked past guest Chelsea Parlett-Pelleriti from Recast to return for an exploration of the topic! Our model crushed it.
The image that Moe referenced in the show:
Items of Interest Mentioned in the Show
- (YouTube Channel) Ritvikmath
- The Recast Blog
- (Article) Unfair but Valid Feedback; a Seeming Contradiction by Deb Liu
- Save This Life pet microchipping company shuts down (Save This Life chips start with either 991 or 900164)
- ^^^ Re-registering can be done at Free Pet Chip Registry
Photo by Nick Baker on Unsplash
Episode Transcript
0:00:05 .8 Announcer: Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.
0:00:18 .0 Tim Wilson: Hi, everyone. Welcome to the Analytics Power Hour. This is episode number 267. I’m Tim Wilson from Facts & Feelings, and according to the logistic regression that I ran personally on the last 40 episodes of this show, there is a 72.3% chance that I’m joined for this episode by Julie Hoyer from Further. Julie, is that you?
0:00:39 .2 Julie Hoyer: Hey, look at that. Yes, it is. Here I am.
0:00:42 .2 TW: Look at my model kicking ass. Sweet. So it also says there’s a 61.4% chance that Michael Helbling from Stacked analytics will be another co-host. Michael? Michael. Michael. Okay. No Michael. Let’s see. Next up, the model said there was a 41.7% chance that Moe Kiss from Canva would be co-hosting. Moe, are you there?
0:01:09 .2 Moe Kiss: I am. But is your model any good?
0:01:13 .0 TW: Well, I plugged in everything I had and when I created the regression, but it still couldn’t perfectly predict who would be co-hosting. Does that mean my model was wrong? Did a model even really exist? Well, maybe the answer is I don’t think so for either one. But I have questions. And when we have questions and we have a podcast, we get to find someone to answer them. In this case, we reached back into our archives for one of our favorite past guests. Chelsea Parlett-Pelleriti, also known as the Chartistician, is a statistician and data scientist who was our guest way back on episode number 149. By day, she is a consulting statistician with our friends at Recast, but she also has a passion for teaching, bringing interest and excitement about math and statistics to the masses in fun and engaging and even endearing ways. She has a PhD in Computational and data sciences, which the last time she was on, she was still working towards. So she has since completed that and she was an assistant professor at Chapman University up until last year, teaching computer science, data science, and machine learning, which made for some pretty awesome social media content.
0:02:24 .5 TW: She’s still keeping her foot in teaching. She’s actually currently teaching a math through video games seminar as an adjunct professor. And she just likes teaching stuff. And maybe I botched my intro. I was doing so well. But today she is our guest. Welcome back to the show, Chelsea.
0:02:43 .7 Chelsea Parlett-Pelleriti: Thank you. It is a pleasure to be here. But it makes me feel very old thinking how long ago it was that I was last on the show.
0:02:52 .7 TW: Okay. Well…
0:02:54 .3 CP: So now you’re making me feel super, super Old.
0:02:56 .3 TW: Yeah, I believe you’re right. I was in my 40s, and that was a long time ago.
0:03:04 .0 CP: I was in my 20s.
0:03:05 .7 TW: Oh, okay. Okay. Well, the passage of time. So this show is… It’s actually a direct result of the listener survey we did last year, which we had a bonus episode that came out a little while back that talked about that. And we had multiple respondents who requested in one way or another that we cover specific statistical methods on the show. And this is really kind of our first attempt at doing that. So we’ll see how it goes. I’m not ashamed to say I got pretty excited as I was thinking about this show because I realized how much I’ve been faking various things for a while. And this is my opportunity to ask questions as though I know the answer when I don’t. And then I will know.
0:03:50 .1 CP: And I can ask the questions like, I don’t know the answer when I really don’t know the answer.
0:03:54 .0 TW: That’s good.
0:03:55 .4 CP: Good compliment.
0:03:56 .7 TW: And Julie will be the only one who understands the answers. So there we go. We’ve got the full. The full set.
0:04:01 .7 JH: It’ll be a refresh for me, too. I don’t get to do as many regressions in my day-to-day as I would like.
0:04:07 .5 TW: Well, what seemed like a great place to start would be with that kind of absolute workhorse of prediction, which is plain old regression. And Chelsea, you’re pretty deep and if I understand all the content I read from Recast pretty well, then you’re pretty deep in the world of kind of Bayesian statistics and causal inference when it comes to doing media mixed modeling work. So does regression come up in your day-to-day at all? Or is that too basic? You’ve moved on to fancier things?
0:04:40 .9 CP: I mean, there’s definitely a time and place for fancier methods. But linear regression is probably the first thing I try in any problem that it might be a good fit for and definitely has a place in my day-to-day still. And I think there’s a sense in which you can think even really complicated MMM Models like you can build with Recast or other tools. There’s a sense in which it’s just an extension of ideas that are present in linear regression. So even if you’re not actually using a linear regression, you’re really capitalizing on the ideas that using linear regression teaches you. So in that sense, it never goes away.
0:05:18 .5 TW: That checks. I feel like I’ve watched my… I’ll count myself as one of the people who, when they finally understood kind of what MMM was and then decided to try to explain it. You always wind up with the slide that shows the formula for regression and says, look, so your dependent variable is. And your independent variable. And t