Disseminate: The Computer Science Research Podcast

<p>This podcast features interviews with Computer Science researchers. Hosted by <a href="https://jackwaudby.github.io/" rel="noopener noreferrer" target="_blank">Dr. Jack Waudby</a> researchers are interviewed, highlighting the problem(s) they tackled, solutions they developed, and how their findings can be applied in practice. This podcast is for industry practitioners, researchers, and students, aims to further narrow the gap between research and practice, and to generally make awesome Computer Science research more accessible. We have 2 types of episode: (i) <strong>Cutting Edge </strong>(red/blue logo) where we talk to researchers about their latest work, and (ii) <strong>High Impact</strong> (gold/silver logo) where we talk to researchers about their influential work.</p><br /><p><strong>You can support the show through </strong><a href="https://www.buymeacoffee.com/disseminate" rel="noopener noreferrer" target="_blank"><strong>Buy Me a Coffee</strong></a>. <strong>A donation of $3 will help us keep making you awesome Computer Science research podcasts. </strong></p><hr /><p style="color: grey; font-size: 0.75em;"> Hosted on Acast. See <a href="https://acast.com/privacy" rel="noopener noreferrer" style="color: grey;" target="_blank">acast.com/privacy</a> for more information.</p>

Haralampos Gavriilidis | SheetReader: Efficient spreadsheet parsing

In this episode of the DuckDB in Research series, Harry Gavriilidis (PhD student at TU Berlin) joins us to discuss Sheet Reader — a high-performance spreadsheet parser that dramatically outpaces traditional tools in both speed and memory efficiency. By taking advantage of the standardized structure of spreadsheet files and bypassing generic XML parsers, Sheet Reader delivers fast and lightweight parsing, even on large files. Now available as a DuckDB extension, it enables users to query spreadsheets directly with SQL and integrate them seamlessly into broader analytical workflows.Harry shares insights into the development process, performance benchmarks, and the surprisingly complex world of spreadsheet parsing. He also discusses community feedback, feature requests (like detecting multiple tables or parsing colored rows), and future plans — including tighter integration with DuckDB and support for Arrow. The conversation wraps up with a look at Harry’s broader research on composable database systems and data interoperability, highlighting how tools like DuckDB are reshaping modern data analysis. Hosted on Acast. See acast.com/privacy for more information.

04-17
40:53

Arjen P. de Vries | faiss: An extension for vector data & search

In this episode of the DuckDB in Research series, we’re joined by Arjen de Vries, Professor of Data Science at Radboud University. Arjen dives into his team’s development of a DuckDB extension for FAISS, a library originally developed at Facebook for efficient similarity search and vector operations.We explore the growing importance of embeddings and dense retrieval in modern information retrieval systems, and how DuckDB’s zero-copy architecture and tight integration with the Python ecosystem make it a compelling choice for managing large-scale vector data. Arjen shares insights into the technical challenges and architectural decisions behind the extension, comparisons with DuckDB’s native VSS (vector search) solution, and the broader vision of integrating vector search more deeply into relational databases.Along the way, we also touch on DuckDB's extension ecosystem, its potential for future research, and why tools like this are reshaping how we build and query modern AI-enabled systems. Hosted on Acast. See acast.com/privacy for more information.

04-10
46:14

David Justen | POLAR: Adaptive and non-invasive join order selection via plans of least resistance

In this episode, we sit down with David Justen to discuss his work on POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance which was implemented in DuckDB. David shares his journey in the database space, insights into performance optimization, and the challenges of working with modern analytical workloads. We dive into the intricacies of query compilation, vectorized execution, and how DuckDB is shaping the future of in-memory databases. Tune in for a deep dive into database internals, industry trends, and what’s next for high-performance data processing!Links: VLDB 2024 PaperDavid's Homepage Hosted on Acast. See acast.com/privacy for more information.

04-03
51:08

Daniël ten Wolde | DuckPGQ: A graph extension supporting SQL/PGQ

In this episode, we sit down with Daniël ten Wolde, a PhD researcher at CWI’s Database Architectures Group, to explore DuckPGQ—an extension to DuckDB that brings powerful graph querying capabilities to relational databases. Daniel shares his journey into database research, the motivations behind DuckPGQ, and how it simplifies working with graph data. We also dive into the technical challenges of implementing SQL Property Graph Queries (SQL PGQ) in DuckDB, discuss performance benchmarks, and explore the future of DuckPGQ in graph analytics and machine learning. Tune in to learn how this cutting-edge extension is bridging the gap between research and industry!Links:DuckPGQ homepageCommunity extensionDaniel's homepage Hosted on Acast. See acast.com/privacy for more information.

03-20
48:38

Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines

In this episode we kick off our DuckDB in Research series with Till Döhmen, a software engineer at MotherDuck, where he leads AI efforts. Till shares insights into DuckDQ, a Python library designed for efficient data quality validation in machine learning pipelines, leveraging DuckDB’s high-performance querying capabilities.We discuss the challenges of ensuring data integrity in ML workflows, the inefficiencies of existing solutions, and how DuckDQ provides a lightweight, drop-in replacement that seamlessly integrates with scikit-learn. Till also reflects on his research journey, the impact of DuckDB’s optimizations, and the future potential of data quality tooling. Plus, we explore how AI tools like ChatGPT are reshaping research and productivity. Tune in for a deep dive into the intersection of databases, machine learning, and data validation!Resources:GitHubPaperSlidesTill's Homepagedatasketches extension (released by a DuckDB community member 2 weeks after we recorded!) Hosted on Acast. See acast.com/privacy for more information.

03-13
58:12

Disseminate x DuckDB Coming Soon...

Hey folks! We have been collaborating with everyone's favourite in-process SQL OLAP database management system DuckDB to bring you a new podcast series - the DuckDB in Research series!At Disseminate our mission is to bridge the gap between research and industry by exploring research that has a real-world impact. DuckDB embodies this synergy—decades of research underpin its design, and now it’s making waves in the research community as a platform for others to build on and this is what the series will focus on! Join us as we kick off the series with:📌 Daniel ten Wolde – DuckPGQ, a graph workload extension for DuckDB supporting SQL/PGQ📌 David Justen – POLAR: Adaptive, non-invasive join order selection 📌 Till Döhmen – DuckDQ: A Python library for data quality checks in ML pipelines📌 Arjen de Vries – FAISS extension for vector similarity search in DuckDB📌 Harry Gavriilidis – SheetReader: Efficient spreadsheet parsingWhether you're a researcher, engineer, or just curious about the intersection of databases and innovation we are sure you will love this series. Subscribe now and stay tuned for our first episode! 🚀 Hosted on Acast. See acast.com/privacy for more information.

03-06
02:40

High Impact in Databases with... Anastasia Ailamaki

In this High Impact in Databases episode we talk to Anastasia Ailamaki.Anastasia is a Professor of Computer and Communication Sciences at the École Polytechnique Fédérale de Lausanne (EPFL). Tune in to hear Anastasia's story! The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.You can find Anastasia on:HomepageGoogle ScholarLinkedIn Hosted on Acast. See acast.com/privacy for more information.

03-03
46:17

High Impact in Databases with... David Maier

In this High Impact episode we talk to David Maier.David is the Maseeh Professor Emeritus of Emerging Technologies at Portland State University. Tune in to hear David's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.You can find David on:HomepageGoogle Scholar Hosted on Acast. See acast.com/privacy for more information.

11-04
01:02:24

High Impact in Databases with... Aditya Parameswaran

In this High Impact episode we talk to Aditya Parameswaran about his some of his most impactful work.Aditya is an Associate Professor at the University of California, Berkeley. Tune in to hear Aditya's story! The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Links:EPIC Data LabAnswering Queries using Humans, Algorithms and Databases (CIDR'11)Potter’s Wheel: An Interactive Data Cleaning System (VLDB'01)Online Aggregation (SIGMOD'97)Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases (INFOVIS'00)Coping with Rejection PonderYou can find Aditya on:TwitterLinkedInGoogle Scholar Hosted on Acast. See acast.com/privacy for more information.

10-21
58:57

High Impact in Databases with... Ali Dasdan

In this High Impact episode we talk to Ali Dasdan, CTO at Zoominfo. Tune in to hear Ali's story and learn about some of his most impactful work such as his work on "Map-Reduce-Merge".The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Materials mentioned on this episode:Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters (SIGMOD'07)The Art of Doing Science and Engineering: Learning to Learn, Richard HammingHow to Solve It, George PolyaSystems Architecting: Creating & Building Complex Systems, Eberhardt RechtinYou can find Ali on:TwitterLinkedIn Hosted on Acast. See acast.com/privacy for more information.

10-08
01:03:02

High Impact in Databases with... Andreas Kipf

In this High Impact episode we talk to Andreas Kipf about his work on "Learned Cardinalities". Andreas is the Professor of Data Systems at Technische Universität Nürnberg (UTN). Tune in to hear Andreas's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Papers mentioned on this episode:Learned Cardinalities: Estimating Correlated Joins with Deep Learning CIDR'19The Case for Learned Index Structures SIGMOD'18Adaptive Optimization of Very Large Join Queries SIGMOD'18You can find Andreas on:TwitterLinkedIn Google ScholarData Systems Lab @ UTN Hosted on Acast. See acast.com/privacy for more information.

07-15
53:06

High Impact in Databases with... Joe Hellerstein

In this High Impact episode we talk to Joe Hellerstein.Joe is the Jim Gray Professor of Computer Science at UC Berkeley. Tune in to hear Joe's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust. Hosted on Acast. See acast.com/privacy for more information.

07-01
52:56

High Impact in Databases with... Raghu Ramakrishnan

In this High Impact episode we talk to Raghu Ramakrishnan.Raghu is CTO for Data and a Technical Fellow at Microsoft. Tune in to hear Raghu's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust. Hosted on Acast. See acast.com/privacy for more information.

06-17
23:56

High Impact in Databases with... Moshe Vardi

Welcome to another episode of the High Impact series - today we talk with Moshe Vardi! Moshe is the Karen George Distinguished Service Professor in Computational Engineering at Rice University where his research focuses on automated reasoning. Tune in to hear Moshe's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.You can find Moshe on X, LinkedIn, and Mastadon @vardi. Links to all his work can be found on his website here. Hosted on Acast. See acast.com/privacy for more information.

06-03
47:39

High Impact in Databases with... Ryan Marcus

Welcome to the first episode of the High Impact series!The High Impact series is inspired by a blog post “Most Influential Database Papers" by Ryan Marcus and today we talk to Ryan! Tune in to hear about Ryan's story so far. We chat about his current work before moving on to discuss his most impactful work. We also dig into what motivates him and how he handles setbacks, as well as getting his take on the current trends.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Links:Most influential database papersRyan's websiteRyan's twitter/XBao: Making Learned Query Optimization PracticalNeo: A Learned Query Optimizer Hosted on Acast. See acast.com/privacy for more information.

05-20
59:52

Introducing the High Impact Series...

Introducing the High Impact Series! Hey folks, we have a new series coming soon inspired by a blog post “Most Influential Database Papers" by Ryan Marcus. The series will feature interviews with the authors of some of the most impactful work in the field of databases. We will talk about the story behind some of their most impactful work, getting them to reflect on the impact it has had over years, as well as getting their take on the current trends in the field. Proudly sponsored by Pometry Hosted on Acast. See acast.com/privacy for more information.

05-06
02:40

Rohan Padhye & Ao Li | Fray: An Efficient General-Purpose Concurrency JVM Testing Platform | #66

In this episode of Disseminate: The Computer Science Research Podcast, guest host Bogdan Stoica sits down with Ao Li and Rohan Padhye (Carnegie Mellon University) to discuss their OOPSLA 2025 paper: "Fray: An Efficient General-Purpose Concurrency Testing Platform for the JVM".We dive into:Why concurrency bugs remain so hard to catch -- even in "well-tested" Java projects.The design of Fray, a new concurrency testing platform that outperforms prior tools like JPF and rr.Real-world bugs discovered in Apache Kafka, Lucene, and Google Guava.The gap between academic research and industrial practice, and how Fray bridges it.What’s next for concurrency testing: debugging tools, distributed systems, and beyond.If you’re a Java developer, systems researcher, or just curious about how to make software more reliable, this conversation is packed with insights on the future of software testing.Links & Resources:- The Fray paper (OOPSLA 2025):- Fray on GitHub- Ao Li’s research - Rohan Padhye’s research Don’t forget to like, subscribe, and hit the 🔔 to stay updated on the latest episodes about cutting-edge computer science research.#Java #Concurrency #SoftwareTesting #Fray #OOPSLA2025 #Programming #Debugging #JVM #ComputerScience #ResearchPodcast Hosted on Acast. See acast.com/privacy for more information.

10-06
58:45

Shrey Tiwari | It's About Time: A Study of Date and Time Bugs in Python Software | #65

In this episode, Bogdan Stoica, Postdoctoral Research Associate in the SysNet group at the University of Illinois Urbana-Champaign (UIUC) steps in to guest host. Bogdan sits down with Shrey Tiwari, a PhD student in the Software and Societal Systems Department at Carnegie Mellon University and member of the PASTA Lab, advised by Prof. Rohan Padhye. Together, they dive into Shrey’s award-winning research on date and time bugs in open-source Python software, exploring why these issues are so deceptively tricky and how they continue to affect systems we rely on every day.The conversation traces Shrey’s journey from industry to research, including formative experiences at Citrix and Microsoft Research, and how those shaped his passion for software reliability. Shrey and Bogdan discuss the surprising complexity of date and time handling, the methodology behind Shrey’s empirical study, and the practical lessons developers can take away to build more robust systems. Along the way, they highlight broader questions about testing, bug detection, and the future role of AI in ensuring software correctness. This episode is a must-listen for anyone interested in debugging, reliability, and the hidden challenges that underpin modern software.Links:It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software 🏆 ACM SIGSOFT Distinguished Paper AwardShrey's homepage Hosted on Acast. See acast.com/privacy for more information.

09-23
01:05:29

Lessons Learned from Five Years of Artifact Evaluations at EuroSys | #64

In this episode we are joined by Thaleia Doudali, Miguel Matos, and Anjo Vahldiek-Oberwagner to delve into five years of experience managing artifact evaluation at the EuroSys conference. They explain the goals and mechanics of artifact evaluation, a voluntary process that encourages reproducibility and reusability in computer systems research by assessing the supporting code, data, and documentation of accepted papers. The conversation outlines the three-tiered badge system, the multi-phase review process, and the importance of open-source practices. The guests present data showing increasing participation, sustained artifact availability, and varying levels of community engagement, underscoring the growing relevance of artifacts in validating and extending research.The discussion also highlights recurring challenges such as tight timelines between paper acceptance and camera-ready deadlines, disparities in expectations between main program and artifact committees, difficulties with specialized hardware requirements, and lack of institutional continuity among evaluators. To address these, the guests propose early artifact preparation, stronger integration across committees, formalization of evaluation guidelines, and possibly making artifact submission mandatory. They advocate for broader standardization across CS subfields and suggest introducing a “Test of Time” award for artifacts. Looking to the future, they envision a more scalable, consistent, and impactful artifact evaluation process—but caution that continued growth in paper volume will demand innovation to maintain quality and reviewer sustainability.Links:Lessons Learned from Five Years of Artifact Evaluations at EuroSys [DOI] Thaleia's HomepageAnjo's HomepageMiguel's Homepage Hosted on Acast. See acast.com/privacy for more information.

07-30
43:48

Dominik Winterer | Validating SMT Solvers for Correctness and Performance via Grammar-based Enumeration | #63

In this episode of the Disseminate podcast, Dominik Winterer discusses his research on SMT (Satisfiability Modulo Theories) solvers and his recent OOPSLA paper titled "Validating SMT Solvers for Correction and Performance via Grammar Based Enumeration". Dominik shares his academic journey from the University of Freiburg to ETH Zurich, and now to a lectureship at the University of Manchester. He introduces ET, a tool he developed for exhaustive grammar-based testing of SMT solvers. Unlike traditional fuzzers that use random input generation, ET systematically enumerates small, syntactically valid inputs using context-free grammars to expose bugs more effectively. This approach simplifies bug triage and has revealed over 100 bugs—many of them soundness and performance-related—with a striking number having already been fixed. Dominik emphasizes the tool’s surprising ability to identify deep bugs using minimal input and track solver evolution over time, highlighting ET's potential for continuous integration into CI pipelines.The conversation then expands into broader reflections on formal methods and the future of software reliability. Dominik advocates for a new discipline—Formal Methods Engineering—to bridge the gap between software engineering and formal verification tools. He stresses the importance of building trustworthy verification tools since the reliability of software increasingly depends on them. Dominik also discusses adapting ET to other domains, such as JavaScript engines, and suggests that grammar-based enumeration can be applied widely to any system with a context-free grammar. Addressing the rise of AI, he envisions validation portfolios that integrate formal methods into LLM-based tooling, offering certified assessments of model outputs. He closes with a call for the community to embrace pragmatic, systematic, and scalable approaches to formal methods to ensure these tools can live up to their promises in real-world development settings.Links:Dominik's HomepageValidating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration Hosted on Acast. See acast.com/privacy for more information.

07-25
43:38

Recommend Channels