DiscoverSuper Data Science: ML & AI Podcast with Jon Krohn
Super Data Science: ML & AI Podcast with Jon Krohn
Claim Ownership

Super Data Science: ML & AI Podcast with Jon Krohn

Author: Jon Krohn

Subscribed: 12,847Played: 478,743
Share

Description

The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast. As the quantity of data on our planet doubles every couple of years and with this trend set to continue for decades to come, there's an unprecedented opportunity for you to make a meaningful impact in your lifetime. In conversation with the biggest names in the data science industry, Jon cuts through hype to fuel that professional impact.


Whether you're curious about getting started in a data career or you're a deep technical expert, whether you'd like to understand what A.I. is or you'd like to integrate more data-driven processes into your business, we have inspiring guests and lighthearted conversation for you to enjoy.


We cover tools, techniques, and implementation tricks across data collection, databases, analytics, predictive modeling, visualization, software engineering, real-world applications, commercialization, and entrepreneurship − everything you need to crush it with data science.

924 Episodes
Reverse
Graphs, but not as you would expect them: Graph analytics guru Amy Hodler speaks to Jon Krohn about the graph data structure and graph applications, graph algorithms, graph RAG, and graphs as memory systems for AI agents. We can use graphs in a surprising number of ways. Money laundering and fraud, as well as supply-chain crime, leave breadcrumbs at multiple “touch-points” over time, behaviors that graphs are better suited to reveal than rows and tables. Amy sees that most interest in graphs has been in the cybersecurity space. But this work isn’t only restricted to fighting crime! Listen to the episode to hear more case examples and how to get into graph work.  This episode is brought to you by the Dell, by the Intel, by ODSC, the Open Data Science Conference and by Gurobi. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/923⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: 01:49) A brief history of graphs (10:08) Uncovering fraud with graphs (28:31) Where graphs are most commonly applied, to date (34:49) Retrieval augmented generation graphs (48:04) The future of graphs
Hugo Dozois-Caouette speaks to Jon Krohn about his startup MaintainX and how he secured $254 million in venture capital, reaching a $2.5 billion valuation. MaintainX builds computerized maintenance management systems (CMMS) and enterprise asset management (EAM) software for industrial and manufacturing companies. This "digital clipboard" delivered through web and mobile apps connects machines, work orders, and frontline teams to boost productivity, reduce downtime, and prevent costly breakdowns. The platform captures knowledge from experienced workers and delivers AI-powered insights, with features like MaintainX CoPilot helping teams troubleshoot issues and make faster decisions. Listen to the episode to hear Hugo's perspective on manufacturing gaps that technology can fill, MaintainX's tech stack, and how CMMS platforms address information disconnects that slow down frontline teams. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/922⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
Using Windows for AI development and the bleeding edge of NPUs: Shirish Gupta and Ish Shah from Dell Technologies speak to Jon Krohn about the latest products from Dell, the future of neural-processing units (NPUs), and how AI developers can make sound hardware investments.  This episode is brought to you by the Trainium2, the latest AI chip from AWS, by ODSC, the Open Data Science Conference and by Gurobi. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/921⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (04:18) Why Windows still outranks other operating systems (20:58) The difference between GPUs and NPUs (32:44) How to access and use Dell’s NPUs and GPUs (49:08) Using processing units on the cloud versus locally (57:43) About the Dell Pro Max
This month’s episode of In Case You Missed It gives us reasons to be cautiously optimistic about the future of large language models (LLMs), with guests discussing what to do about recent reports that found AI agents blackmailed human users when threatened, the importance of post-training LLMs, and the training we have available for data and AI engineers to create robust, secure, and useful AI. Jon Krohn includes clips from his interviews with Akshay Agrawal (Episode 911), Julien Launay (Episode 913), Michelle Yi (Episode 915), and Kirill Eremenko (Episode 917).  Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/920⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
PyTorch, AGI, and the future of alignment research: Aurélien Géron joins Jon Krohn in this live interview to talk about the fourth edition of his bestselling Hands-On Machine Learning as well as what superintelligence makes him hopeful for, as well as what concerns him about machines surpassing human intelligence. This episode is brought to you by Gurobi and by the Dell AI Factory with NVIDIA⁠ Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/919⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (02:04) Why Aurélien wrote Hands-On Machine Learning        (20:54) How Aurélien came to decide on material for the new edition  (28:53) Aurélien’s predictions for AGI                                       (51:21) How to support alignment research                    (1:13:42) Does superintelligence mean super-capability
In this Five-Minute Friday, Jon Krohn introduces listeners to CrewAI, an open-source Python framework that can create and manage multi-agent teams. The clue is in the title: CrewAI assembles specialized agents into single “crews” that achieve complex goals between them. CrewAI’s agent teams can also learn and iterate, meaning that after the crew has achieved its goals for the first time, they can refine and tailor their approach to  future goals.  Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/918⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
Founder of SuperDataScience, Kirill Eremenko, talks to Jon Krohn about how he found the best tools and approaches to help launch his 8-week AI engineering bootcamp. He breaks down the topics participants cover each week, and he also shares his tips with listeners who might want to start their own tech bootcamp or sign up for SuperDataScience’s September 2025 cohort. This episode is brought to you by the Dell AI Factory with NVIDIA and by ODSC, the Open Data Science Conference Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/917⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (10:58) Weeks 1-4 of the SuperDataScience bootcamp             (37:52) How to use AI to drive the bottom line in business                    (47:50) Weeks 5-8 of the SuperDataScience bootcamp (54:50) How to convert LLMs to agents (1:09:33) Jon’s feedback on the SuperDataSciencebootcamp
GPT-5 has just been released, but with not very much fanfare. In this Five-Minute Friday, Jon Krohn asks if GPT-5 deserves the community’s underwhelmed response to its release. He outlines five features of the model and explains why people might be feeling less than enthusiastic in the broader context of LLM development. Which LLMs are leading the way, and which are still playing the game of catch-up? Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/916⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
Tech leader, investor, and Generationship cofounder Michelle Yi talks to Jon Krohn about finding ways to trust and secure AI systems, the methods that hackers use to jailbreak code, and what users can do to build their own trustworthy AI systems. Learn all about “red teaming” and how tech teams can handle other key technical terms like data poisoning, prompt stealing, jailbreaking and slop squatting.  This episode is brought to you by ⁠Trainium2, the latest AI chip from AWS⁠ and by the ⁠Dell AI Factory with NVIDIA⁠. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/915⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (03:31) What “trustworthy AI” means      (31:15) How to build trustworthy AI systems  (46:55) About Michelle’s “sorry bench”   (48:13) How LLMs help construct causal graphs   (51:45) About Generationship
In this Five-Minute Friday, Cofounder and CTO of lakeFS Oz Katz talks to Jon Krohn about data warehouses, data lakes, and how companies can handle increasingly complex data infrastructures and formats. Hear about lakeFS’s collaboration with Legofest, lakeFS’s approach to helping users collaborate on data lakes, and how to overcome the challenges of working with multimodal data. Additional materials: ⁠www.superdatascience.com/914⁠ This episode is brought to you by the ⁠Dell AI Factory with NVIDIA⁠.
Julien Launay launched Adaptive to give data science teams in business enterprises their “RLOps tooling” to make reinforcement learning easier. Talking to Jon Krohn, Julien says, “Most of our users are data scientists who write Python codes to interface with the system”. Adaptive is also able to work with companies without data science teams, collaborating with partners like Deloitte to add the necessary personnel. Julien is currently working on making his platform more widely available. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/913⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
In this episode of In Case You Missed It, we look back on five great interview episodes from July. Hear from Lilith Bat-Leah (Episode 901), Sinan Ozdemir (Episode 903), Sebastian Gehrmann (Episode 905), Zohar Bronfman (Episode 907) and Robert Ness (Episode 909). They’ll tell you why data-centric machine learning is so important across disciplines, starting with law, and how we can use AI benchmarks and “red teaming” to refine our search for the best AI models.  Additional materials: ⁠⁠⁠⁠www.superdatascience.com/912 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
Reproducibility, Python notebooks, and data science communities: Software developer Akshay Agrawal speaks to Jon Krohn about Marimo, the next-generation computational notebook for Python, how he built and fostered a thriving community around the product, and what makes this notebook so versatile and accessible for users.  Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/911⁠⁠⁠⁠⁠ This episode is brought to you by ⁠Trainium2, the latest AI chip from AWS ⁠and by the ⁠Dell AI Factory with NVIDIA⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
In this Five-Minute Friday, Jon Krohn looks into AI’s disruption of the journalism industry and how it has fundamentally reshaped news production. Multiple news outlets’ suing of ChatGPT over its use of copyrighted materials may have taken the most headlines to date, but this isn’t to say news media is rebuffing AI entirely. On the contrary, several outlets have launched summarization and analysis tools for both internal and external use, such as The New York Times’s Echo and The Washington Post’s Haystacker. This episode looks into the ways major news outlets are utilising AI, and what this means for journalists. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/910⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
Researcher at Microsoft Robert Usazuwa Ness talks to Jon Krohn about how to achieve causality in AI with correlation-based learning, the right libraries, and handling statistical inference. When dealing with causal AI, Robert notes how important it is to keep aware of variables in the data that may mislead us and force inaccurate assumptions. Not all variables will be useful. It is essential, then, that any assumptions are grounded in a deeper understanding of how the data were gathered, and not what appears in the dataset. Listen to the episode to hear how you can apply causal AI to your projects. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/907⁠⁠⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS and by the Dell AI Factory with NVIDIA. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
The moral and ethical implications of letting AI take the wheel in business, as revealed by Anthropic: Jon Krohn looks into Anthropic’s latest research on how to use and deploy LLMs safely, specifically in business environments. The team designed scenarios to test the behavior of AI agents when given a goal and a set of obstacles to reach it. Those obstacles included 1) threats to the AI’s continued operation, and 2) conflict between the AI’s goals and the goals of the company. Hear Jon break down the results of this research in this Five-Minute Friday. Additional materials: ⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/908⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
“Intelligence has many forms,” says Zohar Bronfman, who speaks with Jon Krohn about the fascinating intersection between computational neuroscience and philosophy, and how it has brought him closer to understanding what is necessary to develop human-like intelligence in machines, as well as his motivations for launching Pecan AI and why predictive models outstrip generative models in business.  Additional materials: ⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/907⁠⁠⁠ This episode is brought to you⁠⁠⁠ by, ⁠⁠⁠⁠Adverity, the conversational analytics platform⁠⁠⁠⁠ and by the ⁠⁠⁠⁠Dell AI Factory with NVIDIA⁠⁠⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (03:47) Why LLMs aren’t bringing us closer to AGI (33:44) About Pecan AI (51:03) Why data modeling is so challenging (1:01:25) How Pecan AI makes its tools widely accessible
Jason Corso speaks to Jon Krohn in this Five-Minute Friday all about Voxel51’s latest tool, Verified Auto-Labelling, and the company’s incredible success in developing popular tools for computer vision. Additional materials: ⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/906⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
RAG LLMs are not safer: Sebastian Gehrmann speaks to Jon Krohn about his latest research into how retrieval-augmented generation (RAG) actually makes LLMs less safe, the three ‘H’s for gauging the effectivity and value of a RAG, and the custom guardrails and procedures we need to use to ensure our RAG is fit-for-purpose and secure. This is a great episode for anyone who wants to know how to work with RAG in the context of LLMs, as you’ll hear how to select the best model for purpose, useful approaches and taxonomies to keep your projects secure, and which models he finds safest when RAG is applied. Additional materials: ⁠⁠⁠⁠⁠⁠www.superdatascience.com/905⁠⁠ This episode is brought to you⁠ by, ⁠⁠⁠Adverity, the conversational analytics platform⁠⁠⁠ and by the ⁠⁠⁠Dell AI Factory with NVIDIA⁠⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (03:28) Findings from the paper “RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models” (09:35) What attack surfaces are in the context of AI (38:51) Small versus large models with RAG (46:27) How to select an LLM with safety in mind
In this Five-Minute Friday, Jon Krohn reveals how AI is taking on the glitzy world of advertising. Bold claims from Meta and OpenAI contend that users will soon be able to plug in what they want and have AI churn out an ad campaign for little to no cost are shaking the advertising industry to its core. The fact that the four biggest sellers of ads (Google, Meta, Amazon, and ByteDance) are digital companies and accounted for over half of the global market in 2024 adds salt to the wound. Hear the three ways that AI is disrupting the industry, and who (or what) has the most influence on digital consumers to date. Additional materials: ⁠⁠⁠⁠⁠⁠www.superdatascience.com/904 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
loading
Comments (31)

Dena Salamond

12:06 recap begins

Aug 20th
Reply

atefeh

professional 💖

Jan 27th
Reply

atefeh

thank you for this episode.

Mar 5th
Reply (1)

mrs rime

🔴💚Really Amazing ️You Can Try This💚WATCH💚ᗪOᗯᑎᒪOᗩᗪ👉https://co.fastmovies.org

Jan 16th
Reply

Priya Dharshini

🔴WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>👉https://co.fastmovies.org

Jan 16th
Reply

Andrew Miller

I found this podcast really helpful for anyone who wants to better their knowledge of machine learning. I am especially interested in the data processing. If you want to deepen your knowledge of this topic, check this article https://techlogitic.net/categorization-and-data-labeling-for-supervised-machine-learning/. It has some pretty useful information and professional tips from experts in data annotation and tagging.

Apr 21st
Reply

Toben Nelson

a really nice and quick overview with just the right amount of detail.

Mar 3rd
Reply

Maryam Alizadeh

great thanks to you and your endeavors for this pod. I learnt a lot. welcome to Jon , wish you the best 👏👍

Jan 4th
Reply

Maryam Alizadeh

😢

Jan 4th
Reply

Masoud Fard

you are the best

Nov 19th
Reply

Nikhil Parmar

nice summarisation, Data Analyts looks at the past and data scientist looks at past and future

Oct 27th
Reply

Tough Nut

Great talk, very inspiring. thanks.

Aug 6th
Reply

Venkat M

Sleeps 3 hrs a day, not a good example for healthy person. sleep well and keep the brain more refreshed and healthy. #health

Mar 13th
Reply

Mehrdad Salimi

a lot of extra, unrelated stuff. Dude I appreciate your effort but you need to be specific and respect audiences' time.

Jan 19th
Reply

Maria Lacerda

Eu não conhecia Gabriela de Queiroz mas agora ouvindo esse podcast (já ouvi umas 5x) estou completamente encantada. Muito legal descobrir esse nível de profissional pelo mundo e ainda saber que trata-se de uma brasileira.

Dec 2nd
Reply

Natalia Zawadzka

Great job!👍 It's so interesting to listen your podcasts! thanks for sharing your knowledge and helping people to get into data business 🙌👍

Sep 10th
Reply

SriLatha K

Hi thanks for doing this podcast. Being a data engineer and who commutes a lot, I gain a lot from your podcasts. One suggestion that I would like to give is, it would be better if you do not interrupt the speaker until they complete their flow.

May 10th
Reply

Alberto Andrade

What amazing episode! Adrian rocks!! Congratulations!

Apr 25th
Reply

Simon SOUVANNARAT

Thanks for this advice !

Feb 6th
Reply

Troy Kirin

Great episode! I wish he touched on how to connect Sparklyr to data viz like Tableau!

Dec 10th
Reply
loading