DiscoverLinear Digressions
Linear Digressions
Claim Ownership

Linear Digressions

Author: Ben Jaffe and Katie Malone

Subscribed: 5,062Played: 75,973


Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
236 Episodes
Federated Learning

Federated Learning


This is a re-release of an episode first released in May 2017.As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm? In other words, how do we do machine learning when the training dataset is distributed across many devices, imbalanced, and the usage associated with any one user needs to be obscured somewhat to protect the privacy of that user? Enter Federated Learning, a set of related algorithms from Google that are designed to help out in exactly this scenario. If you've used keyboard shortcuts or autocomplete on an Android phone, chances are you've encountered Federated Learning even if you didn't know it.
This is a re-release of an episode first released in February 2017.Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers? It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly distributed. In other words, protest size is endogenous to legislative response, and understanding cause and effect is very challenging.So, what to do? Well, at least in the case of studying Tea Party protest effectiveness, researchers have used rainfall, of all things, to understand the impact of a big protest. In other words, rainfall is the instrumental variable in this analysis that cracks the scientific case open. What does rainfall have to do with protests? Do protests actually matter? What do we mean when we talk about endogenous and instrumental variables? We wouldn't be very good podcasters if we answered all those questions here--you gotta listen to this episode to find out.



Generative adversarial networks (GANs) are producing some of the most realistic artificial videos we’ve ever seen. These videos are usually called “deepfakes”. Even to an experienced eye, it can be a challenge to distinguish a fabricated video from a real one, which is an extraordinary challenge in an era when the truth of what you see on the news or especially on social media is worthy of skepticism. And just in case that wasn’t unsettling enough, the algorithms just keep getting better and more accessible—which means it just keeps getting easier to make completely fake, but real-looking, videos of celebrities, politicians, and perhaps even just regular people.Relevant links:
The topic of bias in word embeddings gets yet another pass this week. It all started a few years ago, when an analogy task performed on Word2Vec embeddings showed some indications of gender bias around professions (as well as other forms of social bias getting reproduced in the algorithm’s embeddings). We covered the topic again a while later, covering methods for de-biasing embeddings to counteract this effect. And now we’re back, with a second pass on the original Word2Vec analogy task, but where the researchers deconstructed the “rules” of the analogies themselves and came to an interesting discovery: the bias seems to be, at least in part, an artifact of the analogy construction method. Intrigued? So were we…Relevant link:
Attention in Neural Nets

Attention in Neural Nets


There’s been a lot of interest lately in the attention mechanism in neural nets—it’s got a colloquial name (who’s not familiar with the idea of “attention”?) but it’s more like a technical trick that’s been pivotal to some recent advances in computer vision and especially word embeddings. It’s an interesting example of trying out human-cognitive-ish ideas (like focusing consideration more on some inputs than others) in neural nets, and one of the more high-profile recent successes in playing around with neural net architectures for fun and profit.
Interview with Joel Grus

Interview with Joel Grus


This week’s episode is a special one, as we’re welcoming a guest: Joel Grus is a data scientist with a strong software engineering streak, and he does an impressive amount of speaking, writing, and podcasting as well. Whether you’re a new data scientist just getting started, or a seasoned hand looking to improve your skill set, there’s something for you in Joel’s repertoire.
What do you get when you cross a support vector machine with matrix factorization? You get a factorization machine, and a darn fine algorithm for recommendation engines.
We've already talked about neural nets in some detail (links below), and in particular we've been blown away by the way that image recognition from convolutional neural nets can be fed into recurrent neural nets that generate descriptions and captions of the images. Our episode today tells a similar tale, except today we're talking about a blog post where the author fed in wireframes of a website design and asked the neural net to generate the HTML and CSS that would actually build a website that looks like the wireframes. If you're a programmer who thinks your job is challenging enough that you're automation-proof, guess again...
We often hear from folks wondering what advice we can give them as they search for their first job in data science. What does a hiring manager look for? Should someone focus on taking classes online, doing a bootcamp, reading books, something else? How can they stand out in a crowd? There’s no single answer, because so much depends on the person asking in the first place, but that doesn’t stop us from giving some perspective. So in this episode we’re sharing that advice out more widely, so hopefully more of you can benefit from it.
This week, we've got a fun paper by our friends at Google about the hidden costs of maintaining machine learning workflows. If you've worked in software before, you're probably familiar with the idea of technical debt, which are inefficiencies that crop up in the code when you're trying to go fast. You take shortcuts, hard-code variable values, skimp on the documentation, and generally write not-that-great code in order to get something done quickly, and then end up paying for it later on. This is technical debt, and it's particularly easy to accrue with machine learning workflows. That's the premise of this episode's paper.
Comments (4)

Li Lae

Thx so much, both. Please keep up the good work!

Mar 28th

Nisarg Shah

Katie and Ben, you both have transformed my journey to learn about machine learning, which seemed impossible before. thanks for taking the time to share your knowledge and providing a fun path to beginners (can only speak for myself :))! I hope you continue this endeavor! we truly appreciate it!

Jul 2nd

Vikram Kulkarni

Katie should do it by herself, the stupid co host is annoying.

Mar 27th

m a

Vikram Kulkarni the idea is them having a dialogue and asking all questions a novice listener might have.

Apr 18th
Download from Google Play
Download from App Store