The Data Engineering Show

The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse. SEASON 2 DATA BROS In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. For inquiries contact tamar@firebolt.io Website: https://www.firebolt.io

PLAY ON CASTBOX

The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft

What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.

12-16

25:46

60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu

What does MLOps look like when you are deploying 60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."

11-19

19:55

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.

10-07

20:20

Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal

Modernizing Search Infrastructure: How Instacart Transitioned from Elasticsearch to PostgreSQL for Enhanced Performance and Simplicity. In this episode of The Data Engineering Show, host Benjamin Wagner speaks with Ankit Mittal, former senior engineer at Instacart, about the company's innovative approach to modernizing their search infrastructure by transitioning from Elasticsearch to PostgreSQL for single-retailer search functionality.

09-17

21:38

Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So

AI is reshaping business intelligence by enabling true self-service analytics and transforming how organizations interact with their data through natural language processing. In this episode of The Data Engineering Show, host Benjamin interviews Lei, Co-founder and CTO of Fabi.ai, to explore how AI-native BI platforms are reshaping data analytics and empowering non-technical users to derive meaningful insights from complex datasets.

08-28

21:07

Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber

In this episode of The Data Engineering Show, the bros speak with Paarth, a Staff Engineer at Uber, about his work on Genie - an innovative AI assistant that revolutionizes on-call support by combining RAG (Retrieval Augmented Generation) with agent-based automation to help engineers find solutions faster.

07-22

25:31

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

Dive into the future of data engineering with Sumit Gupta, Lead BI Engineer at Notion, as he shares insights with the bros on navigating the AI revolution in modern data stacks. From leveraging tools like Snowflake and dbt to automating content creation with AI, discover how traditional technical skills are evolving alongside the rise of AI. Whether you're a seasoned data professional or just starting your journey, learn why embracing AI isn't optional and how to balance technical expertise with crucial soft skills in this rapidly changing landscape. Get an insider's perspective on working at tech giants like Notion, Snowflake, and Dropbox, while exploring practical applications of AI in both professional and personal contexts.

06-10

22:13

How Rising Wave Is Redefining Real-Time Data with Postgres Power

In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide advantages over traditional systems like Flink, and explore the increasing role of Apache Iceberg in data pipelines.

05-07

31:35

Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach

In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. They discuss data catalogs and how they refine the data management process.

04-08

23:36

Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen

In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.

03-19

30:52

AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma

In this episode of The Data Engineering Show, the bros sit with Daniel Pálma, Head of Marketing at Estuary, to delve into the intriguing world of data engineering and marketing. Daniel shares his transition journey into marketing from data engineering and how his technical proficiency has been leveraged to market to engineers. The conversation cuts across the importance of AI in data movement, the future of data engineering, real-time data integration challenges, and the evolution of data integration.

02-11

30:33

AI and Data Change Management with Chad Sanderson, CEO Gable AI

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.

01-07

36:43

Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success

Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.

11-26

24:56

Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan

This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they’re revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.

10-31

28:02

The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn

In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they’re simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.

09-24

32:57

Vector Databases Won’t Replace SQL - Andy Pavlo

SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.

06-04

42:59

How ZoomInfo transitioned from data graveyards to ROI-driven data projects

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture.

04-16

39:46

Matthew Weingarten from Disney Streaming about Data Quality Best Practices

Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.

03-26

27:21

Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices

Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods.

02-29

25:59

Professors Joe Hellerstein and Joseph Gonzalez on LLMs

Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.

01-24

46:07

View All on Castbox

Chad Rourke

In the ever-expanding universe of data management, two giants have emerged - ClickHouse and Snowflake. It's like comparing a speedy starship to a cozy rocket - both designed for different galactic quests. Want to know more about their cosmic clash? 🚀 Enter ClickHouse, the lean, mean, real-time data processing machine. It's like the Millennium Falcon of data warehouses - compact, lightning-fast, and open-source! 🌨️ On the other side, there's Snowflake, the blizzard of data warehousing - cool, flexible, and cloud-native. But wait, there's more! In this epic data duel, there's a wildcard - a fully managed Apache Kafka® service! It's the interstellar courier, delivering data to these titans. 📚 Dig deeper into this celestial showdown https://double.cloud/blog/posts/2023/05/clickhouse-vs-snowflake/. Discover which data behemoth rules the galaxies, and may the data force be with you!

09-03 Reply

Recommend Channels