Author: DataCamp

Subscribed: 2,106Played: 15,340


Data science is one of the fastest growing industries and has been called the ‘Sexiest job of the 21st Century’. But what exactly is data science? In this podcast, brought to you by DataCamp, Hugo Bowne-Anderson approaches the question by exploring what problems data science can solve rather than defining what data science is. From automated medical diagnosis and self-driving cars to recommendation systems and climate change, come on a journey with experts from industry and academia to explore the industry that will change the course of the 21st century.
51 Episodes
#50 Weapons of Math Destruction
In episode 50, our Season 1, 2018 finale of DataFramed, the DataCamp podcast, Hugo speaks with Cathy O’Neil, data scientist, investigative journalist, consultant, algorithmic auditor and author of the critically acclaimed book Weapons of Math Destruction. Cathy and Hugo discuss the ingredients that make up weapons of math destruction, which are algorithms and models that are important in society, secret and harmful, from models that decide whether you keep your job, a credit card or insurance to algorithms that decide how we’re policed, sentenced to prison or given parole? Cathy and Hugo discuss the current lack of fairness in artificial intelligence, how societal biases are perpetuated by algorithms and how both transparency and auditability of algorithms will be necessary for a fairer future. What does this mean in practice? Tune in to find out. As Cathy says, “Fairness is a statistical concept. It's a notion that we need to understand at an aggregate level.” And, moreover, “data science doesn't just predict the future. It causes the future.”LINKS FROM THE SHOWDATAFRAMED SURVEYDataFramed Survey (take it so that we can make an even better podcast for you)DATAFRAMED GUEST SUGGESTIONSDataFramed Guest Suggestions (who do you want to hear on Season 2?)FROM THE INTERVIEWCathy on TwitterCathy's Blog MathbabeWeapons of Math Destruction: How big data increases inequality and threatens democracy by Cathy O'NeilCathy's Opinion Column, Bloomberg Doing Data Science (By Cathy O'Neil and Rachel Schutt)Cathy O'Neil & Hanna Gunn's "Ethical Matrix" paper coming soon.FROM THE SEGMENTSData Science Best Practices (with Heather Nolis ~20:30)Using docker to deploy an R plumber API (By Jonathan Nolis and Heather Nolis)Enterprise Web Services with Neural Networks Using R and TensorFlow (By Jonathan Nolis and Heather Nolis)Data Science Best Practices (with Ben Skrainka ~39:35)The Clean Coder Blog (By Robert C. Martin)James Shore’s blog post on Red, Green, RefactorJeff Knupp’s Python Unittesting tutorial (general unit tests in Python)John Myles White’s Intro to Unit Testing in ROriginal music and sounds by The Sticks.
#49 Data Science Tool Building
Hugo speaks with Wes McKinney, creator of the pandas project for data analysis tools in Python and author of Python for Data Analysis, among many other things. Wes and Hugo talk about data science tool building, what it took to get pandas off the ground and how he approaches building “human interfaces to data” to make individuals more productive. On top of this, they’ll talk about the future of data science tooling, including the Apache arrow project and how it can facilitate this future, the importance of DataFrames that are portable between programming languages and building tools that facilitate data analysis work in the big data limit. Pandas initially arose from Wes noticing that people were nowhere near as productive as they could be due to lack of tooling & the projects he’s working on today, which they’ll discuss, arise from the same place and present a bold vision for the future.LINKS FROM THE SHOWDATAFRAMED SURVEYDataFramed Survey (take it so that we can make an even better podcast for you)DATAFRAMED GUEST SUGGESTIONSDataFramed Guest Suggestions (who do you want to hear on Season 2?)FROM THE INTERVIEWWes on TwitterRoads and Bridges: The Unseen Labor Behind Our Digital Infrastructure by Nadia Eghbalpandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.Ursa LabsFROM THE SEGMENTSData Science Best Practices (with Ben Skrainka ~17:10)To Explain or To Predict? (By Galit Shmueli)Statistical Modeling: The Two Cultures (By Leo Breiman)The Book of Why (By Judea Pearl & Dana Mackenzie)Studies in Interpretability (with Peadar Coyle at ~39:00)Modelling Loss Curves in Insurance with RStan (By Mick Cooney)Lime: Explaining the predictions of any machine learning classifier Probabilistic Programming PrimerOriginal music and sounds by The Sticks.
#48 Managing Data Science Teams
In this episode of DataFramed, the DataCamp podcast, Hugo speaks with Angela Bassa about managing data science teams. Angela is Director of Data Science at iRobot, where she leads the team through development of machine learning algorithms, sentiment analysis, and anomaly detection processes. iRobot are the makers of consumer robots that we all know and love, like the Roomba, and the Braava which are, respectively, a robotic vacuum cleaner and a robotic mop. Angela will talk about how to get into data science management, the most important strategies to ensure that your data science team delivers value to the organization, how to hire data scientists and key points to consider as your data science team grows over time, in addition to the types of trade-offs you need to make as a data science manager and how you make the right ones. Along the way, you’ll see why a former marine biologist has the skills and ways of thinking to be a super data scientist at a company like iRobot and you’ll also see the importance of throwing data analysis parties.LINKS FROM THE SHOWFROM THE INTERVIEWAngela on TwitterHBR NewslettersiRobot CareersData Science InternshipFROM THE SEGMENTSCorrecting Data Science Misconceptions (w/ Heather Nolis ~18:45)Using docker to deploy an R plumber API (By Jonathon Nolis)Enterprise Web Services with Neural Networks Using R and TensorFlow (By Jonathan Nolis and Heather Nolis)Project of the Month (w/ David Venturi ~38:45)Rise and Fall of Programming Languages (R Project by David Robinson)Learn, Practice, Apply! (By Ramnath Vaidyanathan)Apply to create a DataCamp project! Original music and sounds by The Sticks.
#47 Human-centered Design in Data Science
Hugo speaks with Peter Bull about the importance of human-centered design in data science. Peter is a data scientist for social good and co-founder of Driven Data, a company that brings cutting-edge practices in data science and crowdsourcing to some of the world's biggest social challenges and the organizations taking them on, including machine learning competitions for social good. They’ll speak about the practice of considering how humans interact with data and data products and how important it is to consider them while designing your data projects. They’ll see how human-centered design provides a robust and reproducible framework for involving the end-user all through the data work, illuminated by examples such as DrivenData’s work in financial services and Mobile Money in Tanzania. Along the way, they’ll discuss the role of empathy in data science, the increasingly important conversation around data ethics and much, much more.LINKS FROM THE SHOWFROM THE INTERVIEWPeter on TwitterDrivenDataDeon (Ethics Checklist)Cookiecutter Data ScienceIf you liked this interview, you might be interested in working with DrivenData! Currently, the team is looking for a software engineer who loves the idea of building Python applications for social impact. Apply Here!FROM THE SEGMENTSProbability Distributions and their Stories (with Justin Bois at ~24:00)Justin's Website at CaltechProbability distributions and their stories (By Justin Bois)Studies in Interpretability (with Peadar Coyle at ~38:10)Interpretable ML SymposiumHow will the GDPR impact machine learning? (By Andrew Burt)How to use Bayesian Stats in your daily job (Gates, Perry, Zorn (2002))Fairness in Machine Learning (By Moritz Hardt)Original music and sounds by The Sticks.
#46 AI in Healthcare, an Insider's Account
In this episode of DataFramed, a DataCamp podcast, Hugo speaks with Arnaub Chatterjee. Arnaub is a Senior Expert and Associate Partner in the Pharmaceutical and Medical Products group at McKinsey & Company. They’ll discuss cutting through the hype about artificial intelligence (AI) and machine learning (ML) in healthcare by looking at practical applications and how McKinsey & Company is helping the industry evolve.Tune in for an insider’s account into what has worked in healthcare, from ML models being used to predict nearly everything in clinical settings, to imaging analytics for disease diagnosis, to wound therapeutics. Will robots and AI replace disciplines such as radiology, ophthalmology, and dermatology? How have the moving parts of data science work evolved in healthcare? What does the future of data science, ML and AI in healthcare hold? Stick around to find out.LINKS FROM THE SHOWFROM THE INTERVIEWMcKinsey Analytics on TwitterHot off the press article for HBR’s Future of Healthcare online forum (By Arnaub Chatterjee)Our latest piece on the promise & challenge of AI (By James Manyika and Jacques Bughin)Are robots coming for our jobs? ( Careers page ( we help clients in healthcare analytics ( analysis of 400+ use cases, including ones in healthcare (By Michael Chui et al. THE SEGMENTSMachines that Multi-task (with Manny Moss)Part 1 at ~21:05Responsible AI in Consumer EnterpriseHilary Mason, DJ Patil and Mike Loukides on Data EthicsEthicalOS TookitPart 2 at ~40:0021 Definitions of Fairness Tutorial from FAT* (Arvind Naranayan)Kate Crawford's keynote address "The Trouble with Bias" from NIPS 2017The (im)possibility of Fairness (Sorelle et al. from disparate data sources (Li Y et al. Multi-task Learning (Liyang Xie et al. Cost of Fairness in Binary Classification (Aditya Krishna Menon et al. music and sounds by The Sticks.
#45 Decision Intelligence and Data Science
In this episode of DataFramed, Hugo speaks with Cassie Kozyrkov, Chief Decision Scientist at Google Cloud. Cassie and Hugo will be talking about data science, decision making and decision intelligence, which Cassie thinks of as data science plus plus, augmented with the social and managerial sciences. They’ll talk about the different and evolving models for how the fruits of data science work can be used to inform robust decision making, along with pros and cons of all the models for embedding data scientists in organizations relative to the decision function. They’ll tackle head on why so many organizations fail at using data to robustly inform decision making, along with best practices for working with data, such as not verifying your results on the data that inspired your models. As Cassie says, “Split your damn data”.Links from the showFROM THE INTERVIEWCassie on Twitter Is data science a bubble? (By Cassie Kozyrkov, Hackernoon)Incompetence, delegation, and population (By Cassie Kozyrkov, Hackernoon)Populations — You’re doing it wrong (By Cassie Kozyrkov, Hackernoon)What on earth is data science? (By Cassie Kozyrkov, Hackernoon)FROM THE SEGMENTSProbability Distributions and their Stories (with Justin Bois at ~19:45)Justin's Website at CaltechProbability distributions and their stories (By Justin Bois)Machines that Multi-Task (with Friederike Schüür of Fast Forward Labs ~43:45)Sebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural NetworksMulti-Task Learning for NLP, also by Sebastian RuderGANs for Fake Celebrity Images (Karras et al, Nvidia)Adversarial Multi-Task Learning for Text Classification (Liu et al., music and sounds by The Sticks.
#44 Project Jupyter and Interactive Computing
In this episode of DataFramed, Hugo speaks with Brian Granger, co-founder and co-lead of Project Jupyter, physicist and co-creator of the Altair package for statistical visualization in Python.They’ll speak about data science, interactive computing, open source software and Project Jupyter. With over 2.5 million public Jupyter notebooks on github alone, Project Jupyter is a force to be reckoned with. What is interactive computing and why is it important for data science work? What are all the the moving parts of the Jupyter ecosystem, from notebooks to JupyterLab to JupyterHub and binder and why are they so relevant as more and more institutions adopt open source software for interactive computing and data science? From Netflix running around 100,000 Jupyter notebook batch jobs a day to LIGO’s Nobel prize winning discovery of gravitational waves publishing all their results reproducibly using Notebooks, Project Jupyter is everywhere. Links from the show FROM THE INTERVIEWBrian on Twitter Project JupyterBeyond Interactive: Notebook Innovation at Netflix (Ufford, Pacer, Seal, Kelley, Netflix Tech Blog)Gravitational Wave Open Science Center (Tutorials)JupyterCon YouTube Playlistjupyterstream Github RepositoryFROM THE SEGMENTSMachines that Multi-Task (with Friederike Schüür of Fast Forward Labs)Part 1 at ~24:40Brief Introduction to Multi-Task Learning (By Friederike Schüür)Overview of Multi-Task Learning Use Cases (By Manny Moss)Multi-Task Learning for the Segmentation of Building Footprints (Bischke et al., as Question Answering (McCann et al., Salesforce Natural Language Decathlon: A Multitask Challenge for NLP Part 2 at ~44:00Rich Caruana’s Awesome Overview of Multi-Task Learning and Why It WorksSebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural NetworksMassively Multi-Task Network for Drug Discovery, 259 Tasks (!) (Ramsundar et al. Overview of Multi-Task Learning with Video of Newsie, the Prototype (By Friederike Schüür) Original music and sounds by The Sticks.
#43 Election Forecasting and Polling
Hugo speaks with Andrew Gelman about statistics, data science, polling, and election forecasting. Andy is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University and this week we’ll be talking the ins and outs of general polling and election forecasting, the biggest challenges in gauging public opinion, the ever-present challenge of getting representative samples in order to model the world and the types of corrections statisticians can and do perform. "Chatting with Andy was an absolute delight and I cannot wait to share it with you!"-Hugo  Links from the show FROM THE INTERVIEWAndrew's Blog Andrew on Twitter We Need to Move Beyond Election-Focused Polling (Gelman and Rothschild, Slate)We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results (Cohn, The New York Times).19 things we learned from the 2016 election (Gelman and Azari, Science, 2017)The best books on How Americans Vote (Gelman, Five Books)The best books on Statistics (Gelman, Five Books)Andrew's Research FROM THE SEGMENTSStatistical Lesson of the Week (with Emily Robinson at ~13:30)The five Cs (Loukides, Mason, and Patil, O'Reilly)Data Science Best Practices (with Ben Skrainka~40:40)Oberkampf & Roy’s Verification and Validation in Scientific Computing provides a thorough yet very readable treatment A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing (Roy and Oberkampf, Science Direct) Original music and sounds by The Sticks.
Comments (3)

Paolo Eusebi

Amazing episode! How many listeners worked with Stan in R? What are their impressions over other bayesian software?

Oct 9th

Rafael Anjos

The contents are very good. Thank you for your good job

Sep 18th

Anthony Giancursio


Jul 19th
Download from Google Play
Download from App Store