Data Skeptic

587 Episodes

Reverse

Music Playlist Recommendations

2025-10-2952:29

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start problem. A significant contribution of Rebecca's work is the Music Semantics dataset, created by scraping Reddit discussions to capture how people naturally describe music using atmospheric qualities, contextual comparisons, and situational associations rather than just technical features. This dataset, available on Hugging Face, enables more nuanced recommendation systems that better understand user preferences and support niche tastes. Her research utilizes industry datasets including Last.fm and Spotify's Million Playlist Dataset, and points toward exciting future applications in music generation and multimodal systems that combine audio, text, and video.

Bypassing the Popularity Bias

2025-10-1534:33

Sustainable Recommender Systems for Tourism

2025-10-0938:02

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI researchers and tourism industry professionals seeking to implement more responsible recommendation technologies.

Interpretable Real Estate Recommendations

2025-09-2232:57

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about. Dr. Mukherjee, explains how his team developed a graph neural network approach that not only recommends properties but provides human-interpretable explanations for why certain regions are suggested. The conversation covers the advantages of using graph-based models over traditional recommendation systems, the importance of regional context in real estate features, and how co-click data from similar users can create more effective recommendations. Key topics include the distinction between model developer explanations and end-user explanations, the challenges of feature perturbation in recommendation systems, and how graph neural networks can discover novel pathways to emerging real estate markets that traditional models might miss.

Why Am I Seeing This?

2025-09-0849:36

In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferring the influence of opaque algorithms.

Eco-aware GNN Recommenders

2025-08-3044:42

In this episode of Data Skeptic, we dive into eco-friendly AI with Antonio Purificato, a PhD student from Sapienza University of Rome. Antonio discusses his research on "EcoAware Graph Neural Networks for Sustainable Recommendations" and explores how we can measure and reduce the environmental impact of recommender systems without sacrificing performance.

Networks and Recommender Systems

2025-08-1717:45

Kyle reveals the next season's topic will be "Recommender Systems". Asaf shares insights on how network science contributes to the recommender system field.

Network of Past Guests Collaborations

2025-07-2134:10

Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers.

The Network Diversion Problem

2025-07-0646:14

In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers to solve NP-complete problems more efficiently when certain parameters, like the structure of the graph, are "well-behaved". At the center of the discussion is the network diversion problem, where the goal isn't to block all routes between two points in a network, but to force flow - such as traffic, electricity, or data - through a specific path. While this problem appears deceptively similar to the classic "Min.Cut/Max.Flow" algorithm, it turns out to be much harder and, in general, its complexity is still unknown. Parameterized complexity plays a key role here by offering ways to make the problem tractable under constraints like low treewidth or planarity, which often exist in real-world networks like road systems or utility grids. Listeners will learn how vulnerability measures help identify weak points in networks, such as geopolitical infrastructure (e.g., gas pipelines like Nord Stream). Follow out guest: Pål Grønås Drange

Complex Dynamic in Networks

2025-06-2856:00

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. BarzelLab BarzelLab on Youtube Paper in focus: Universality in network dynamics, 2013

Github Network Analysis

2025-06-2236:461

In this episode we'll discuss how to use Github data as a network to extract insights about teamwork. Our guest, Gabriel Ramirez, manager of the notifications team at GitHub, will show how to apply network analysis to better understand and improve collaboration within his engineering team by analyzing GitHub metadata - such as pull requests, issues, and discussions - as a bipartite graph of people and projects. Some insights we'll discuss are how network centrality measures (like eigenvector and betweenness centrality) reveal organizational dynamics, how vacation patterns influence team connectivity, and how decentralizing communication hubs can foster healthier collaboration. Gabriel's open-source project, GH Graph Explorer, enables other managers and engineers to extract, visualize, and analyze their own GitHub activity using tools like Python, Neo4j, Gephi and LLMs for insight generation, but always remember – don't take the results on face value. Instead, use the results to guide your qualitative investigation.

Networks and Complexity

2025-06-1417:49

In this episode, Kyle does an overview of the intersection of graph theory and computational complexity theory. In complexity theory, we are about the runtime of an algorithm based on its input size. For many graph problems, the interesting questions we want to ask take longer and longer to answer! This episode provides the fundamental vocabulary and signposts along the path of exploring the intersection of graph theory and computational complexity theory.

Graphs for Causal AI

2025-05-2441:00

How to build artificial intelligence systems that understand cause and effect, moving beyond simple correlations? As we all know, correlation is not causation. "Spurious correlations" can show, for example, how rising ice cream sales might statistically link to more drownings, not because one causes the other, but due to an unobserved common cause like warm weather. Our guest, Utkarshani Jaimini, a researcher from the University of South Carolina's Artificial Intelligence Institute, tries to tackle this problem by using knowledge graphs that incorporate domain expertise. Knowledge graphs (structured representations of information) are combined with neural networks in the field of neurosymbolic AI to represent and reason about complex relationships. This involves creating causal ontologies, incorporating the "weight" or strength of causal relationships and hyperrelations. This field has many practical applications such as for AI explainability, healthcare and autonomous driving. Follow our guest Utkarshani Jaimini's Webpage Linkedin Papers in focus CausalLP: Learning causal relations with weighted knowledge graph link prediction, 2024 HyperCausalLP: Causal Link Prediction using Hyper-Relational Knowledge Graph, 2024

Power Networks

2025-05-1641:50

Unveiling Graph Datasets

2025-05-0844:12

Network Manipulation

2025-04-3040:58

In this episode we talk with Manita Pote, a PhD student at Indiana University Bloomington, specializing in online trust and safety, with a focus on detecting coordinated manipulation campaigns on social media. Key insights include how coordinated reply attacks target influential figures like journalists and politicians, how machine learning models can detect these inauthentic campaigns using structural and behavioral features, and how deletion patterns reveal efforts to evade moderation or manipulate engagement metrics. Follow our guest X/Twitter Google Scholar Papers in focus Coordinated Reply Attacks in Influence Operations: Characterization and Detection ,2025 Manipulating Twitter through Deletions,2022

The Small World Hypothesis

2025-04-2117:251

Kyle discusses the history and proof for the small world hypothesis.

Thinking in Networks

2025-04-1233:55

Kyle asks Asaf questions about the new network science course he is now teaching. The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.

Fraud Networks

2025-04-0142:55

In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics. Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors. Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google's PageRank, helps rank suspicious claims based on network structure. Bavo will also present his iFraud simulator that can be used to model fraudulent networks for detection training purposes. Do you have a question about fraud detection? Bavo says he will gladly help. Feel free to contact him. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year https://plus.dataskeptic.com

Criminal Networks

2025-03-1743:351

In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks. Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025. Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies. Key insights include the challenges of incomplete and inaccurate data in criminal network analysis, how law enforcement agencies use network dismantling techniques to disrupt organized crime, and the role of machine learning in predicting hidden connections within illicit networks. ------------------------------- Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year https://plus.dataskeptic.com

Comments (33)

Junaid Jabbar

This is very incredible. I truly enjoyed it. I am here to share about anon vault at https://techiall.com/anon-vault/

Apr 1st

mrs rime

🔴💚Really Amazing ️You Can Try This💚WATCH💚ᗪOᗯᑎᒪOᗩᗪ👉https://co.fastmovies.org

Jan 16th

Priya Dharshini

🔴WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>👉https://co.fastmovies.org

Tommy king

You would normally use a 3D rendering programme or game engine that supports rendering animations and sequences to render sequences like Maya. Here are some general guidelines to assist you in doing that have a peek here https://thewordpoint.com/services/translation-service/film-script-translation depending on the number of countries you intend to release your film in, you may need to hire film screenplay translation services in more than one language if you want to deliver your tale to a global audience.

Jun 21st

DemonDogs

incredibly awful audio for some reason

Apr 25th

tell tims

Now a days surveys are crucial to implement any type of instructions or development works. So, it's great thing to conduct the surveys. That's why we (Tim Hortons) are conducting the surveys at the official survey site https://telltims-ca.com. In return we have providing free validation codes to the survey participants to redeem on their further visit.

Mar 27th

Edward McBride

Hi friends. I agree, this book is very useful. But I'm not strong on marketing. So I enlisted the help of a proven marketing agency, https://saphira.agency/. They helped me promote my business and make my marketing loud, bright, unique and effective. I am sure that you should contact this agency if you need help in promoting your brand or business.

Dec 29th

John Skinner

Sponsored Social Media Posts: You can pay to turn one of your company's social media posts into an ad. This method allows you to select the target audience, region, and duration of the ad. Pay Per Click Ads. With this model, you pay every time someone clicks on your ad. Search engines such as Google offer this service where your ad appears at the top of the results page for your chosen keywords.

Anthony Hall

Visuals are key. Investing in high quality visual content for your website and social media is a must. Many users rely on images to decide who to follow and what messages to interact with. Consider including photos or videos of your products, services, facilities, or staff.

Dec 28th

Hasters

In general, as for the affiliate program for designers, it is best to find a special offer. I already know from experience that here https://masterbundles.com/become-our-affiliate/ in the company Master Bundles you can get up to 15% commission if you start cooperation with this service. For me for example it helped me a lot, I hope it will help you too. Good luck with this business.

Dec 21st

Bill

Wow this is fascinating

Oct 1st

the guest has classic psychological and groupthink issues in his research he needs to get out more

Aug 3rd

Reply (1)

Data Science LL

Hi guys in data skeptic. Thanks to share your valuable contents. May I acheive the text of your podcast? Wish luck.

Jul 20th

Vassili Savinov

great episode, cant wait to hear next one. Thanks!

Jun 28th

Description?

Jan 23rd

ncooty

@6:00: The threshold for statistical significance does not "depend on the outcome." It raises a red flag even to hear someone say that, especially the host of a "data science" podcast. (Of course, if he knew what he was talking about, he'd be a "statistician" instead.) He might more accurately have said that any such estimate of the minimum sample size depends on the number of planned comparisons and the assumed effect size for each measured effect. Confusion about this should disqualify someone from hosting such a podcast.

Aug 23rd

@2:19: Too much interpretation as if respondents were randomly sampled. Respondents self-selected.

Antonio Andrade

thanks so much for sharing the results

Aug 12th

@1:03: It doesn't "beg the question"; it "raises the question." To "beg the question" is to commit a logical fallacy in which one assumes the conclusion.

Jun 15th

Benjamin Weckerle

Is the spin-off / journal club podcast on castbox?

Jun 2nd

#box-pro-ellipsis-17618923323614{-webkit-line-clamp:2;}Data Skeptic

Junaid Jabbar

mrs rime

Priya Dharshini

Tommy king

DemonDogs

tell tims

Edward McBride

John Skinner

Anthony Hall

Hasters

Bill

DemonDogs

Data Science LL

Vassili Savinov

DemonDogs

ncooty

ncooty

Antonio Andrade

ncooty

Benjamin Weckerle