DiscoverData Engineering Central Podcast
Data Engineering Central Podcast
Claim Ownership

Data Engineering Central Podcast

Author: Data Engineering in Real Life

Subscribed: 16Played: 116
Share

Description

Long Live the Data Engineer. No holds barred. Talking about Data Engineering news, topics, and general mayhem.

dataengineeringcentral.substack.com
21 Episodes
Reverse
In this episode of Data Engineering Central, I sit down with the founder of DataFlint, Daniel Aronovich, to talk about the realities of working with Apache Spark, distributed data systems, and the future of data engineering.We start with his early journey into tech—how he first discovered large-scale data systems and the lessons he learned from working with real-world Spark workloads.* The conversation then turns toward the future of data engineering, particularly the growing role of AI in software development and data infrastructure. We discuss why generic AI coding assistants often struggle with complex distributed systems, whether AI will eventually be able to automatically optimize data pipelines, and how the role of the data engineer may evolve in the coming years.We covered a lot of career advice for new and upcoming data professionals.We also discuss the origin of DataFlint, a tool designed to help engineers better understand and optimize Spark workloads by analyzing execution plans, logs, and runtime context.If you work with Spark, large-scale data pipelines, or modern data platforms, this conversation will give you a deeper look into how the data engineering landscape is evolving.Thanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode, I sit down with Matt Martin, Staff Engineer, data architect, ETL practitioner, and author of a new book on DuckDB coming soon, to talk about the past, present, and future of data engineering.Matt has spent decades building and architecting data platforms across technologies such as SQL Server, Oracle, DB2, Hadoop, Redshift, and BigQuery, and now focuses on modern tools such as DuckDB and single-node analytics.We discuss how the data industry has evolved, what actually makes data platforms succeed, and where tools like DuckDB, Polars, Databricks, and Snowflake fit into the future of analytics.We also dive into the impact of AI on coding and data engineering, and whether distributed compute clusters will remain dominant — or if more workloads will move toward high-performance single-node systems.Topics Covered* Matt’s early career and journey into data engineering* The evolution of data warehousing and ETL frameworks* Traditional enterprise data systems vs modern cloud platforms* DuckDB and the rise of single-node analytics* Polars vs DuckDB: where each tool shines* Databricks vs Snowflake* AI-assisted coding and its impact on engineers* The current data engineering job market* Lessons learned from decades of building data systems* Writing a book on DuckDB This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of Data Engineering Central, I sit down with a veteran Software Engineer John Crickett; with decades of experience in the industry to unpack what really matters in building a long and successful engineering career.We talk about how he first got into software, the early jobs and tools that shaped his thinking, and the massive technology shifts he’s witnessed across decades of engineering—from early stacks and tools to today’s AI-assisted workflows.* We also dive into the difference between coding and real-world software engineering, what separates junior, senior, and principal engineers, and why many developers misunderstand what it takes to grow in this field.* We discuss leadership vs individual contributor paths, the origin of his Coding Challenges platform, why algorithm puzzles dominate developer culture, and what actually makes engineers improve quickly.Finally, we tackle the big question everyone is asking right now: how AI is reshaping software engineering, and what skills will matter most over the next decade. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of the Data Engineering Central Podcast, I sit down with Yuki (Yuki Kakegawa) to talk about his journey into tech, the tools and platforms he’s worked with, and where he thinks data engineering and AI are headed next.We cover:• How Yuki got into tech• Early career lessons and pivots• Tools and technologies he’s worked with over the years• How data engineering has evolved• The impact of AI on software development• What engineers should focus on right now• Advice for those building their careers in dataYuki shares practical insights on navigating the industry, staying adaptable, and thinking long-term about your technical growth.If you’re a data engineer, aspiring engineer, or just interested in where AI and modern software are going, this one’s for you.Yuki writes on …LinkedIn - https://www.linkedin.com/in/yukikakegawa/https://yukikakegawa.me/#blogThanks for reading Data Engineering Central! This post is public so feel free to share it.🔔 Subscribe for more interviews with leaders in data engineering, AI, and modern data platforms. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of Data Engineering Central, I sit down with Bart Konieczny — data engineer, distributed systems expert, and well-known author in the Data and Spark ecosystem — for a deep technical conversation about modern data engineering.We cover:* How Bart got into tech and distributed systems* His journey through different engineering roles* Spark internals and why they still matter* The realities of lakehouse architecture* Streaming vs batch systems* AI’s impact on data engineering* What engineers should focus on in 2026In a world obsessed with abstractions and AI tooling, we explore whether understanding the internals is still worth it — or if the game has fundamentally changed.If you’re a data engineer, architect, or platform leader trying to navigate the next phase of the lakehouse era, this one’s for you.Thanks for reading Data Engineering Central! This post is public so feel free to share it.—🎙️ Data Engineering Central PodcastHosted by Daniel BeachIf you’re a CTO or data leader looking for help building or optimizing your data platform, reach out — consulting inquiries welcome.Data Engineering Central is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of the Data Engineering Central Podcast, I sit down with Maxine Meurer, DevOps engineer, author, and educator behind I Love DevOps, for a wide-ranging conversation about careers, infrastructure, automation, and what it actually means to build systems that last.This isn’t a buzzword-heavy DevOps chat. It’s a grounded, honest discussion between two engineers about how people really get into tech, how careers evolve over time, and why modern infrastructure is as much about systems thinking and human judgment as it is about tools.We talk through Maxine’s journey from early technical curiosity to hands-on DevOps work, dealing with “ClickOps” to automation-first infrastructure, and how writing and teaching reshaped the way she thinks about engineering.What we cover in this episode:* 🛠️ From ClickOps to DevOps — what that transition actually looks like in the real world* 🧠 Why DevOps is fundamentally about systems and people, not just pipelines and YAML* 📚 How Maxine went from self-teaching to authoring practical guides like LLMs for Humans and The DevOps Career Switch Blueprint* 🤯 Common mistakes engineers make when learning DevOps, cloud, and distributed systems* 🔍 Testing failures, production realities, and where modern infrastructure still breaks down* 🤖 What AI and LLMs actually change for engineers, and what’s mostly hype* 🧭 Career advice for engineers without a traditional background* 🔮 Where DevOps and platform engineering are heading over the next 3–5 yearsThroughout the conversation, Maxine brings a refreshing, human-centered perspective to topics that are often over-abstracted or oversold. We dig into the tradeoffs behind tooling choices, the reality of production systems, and the importance of learning how to think, not just what to deploy.If you’re navigating a DevOps or infrastructure career, wrestling with modern stacks, or trying to make sense of AI’s role in engineering, this episode offers clarity, context, and hard-won insight.Learn more about Maxine’s work:* Writing & guides: * LinkedIn: https://www.linkedin.com/in/maxinemeurer/* Gumroad resources: https://mameurer.gumroad.comThanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode, I sit down with industry veteran Robin Moffatt — Sr. Principal Advisor in Streaming Data Technologies (Kafka, etc.) and a longtime voice in the data engineering community, to unpack the journey from old-school data architectures to today’s real-time streaming ecosystems. From early mainframe data processing and COBOL through the rise of Apache Kafka, streaming ETL, and event-driven systems, Robin shares lived experience from across decades of building, scaling, and evolving data platforms.We dive into:* 🧠 How the role of software engineering has shifted with the rise of distributed, real-time systems* 📊 Why event streaming and platforms like Kafka aren’t just messaging systems, but the backbone of modern data architectures* 🚀 How the community’s tooling and mental models have had to evolve — from static databases and nightly jobs to continuous, always-on streaming applications* 🤖 A candid look at how AI and real-time data are intersecting, shaping both tooling and expectations for the next decade* 🔮 Robin’s perspective on where the industry is headed — beyond buzzwords toward real engineering maturityAlong the way, we get historical context, real-world lessons from conference stages and community forums, and a perspective on building resilient, scalable systems that power today’s data-rich applications.If you’ve ever wondered how we got from batch jobs to continuous event streams, or what it really takes to build modern pipelines that support AI workflows, this conversation with Robin is a must-listen.For more from Robin:* 📍 His personal blog & talks: https://rmoff.net/* 🔗 LinkedIn profile: https://www.linkedin.com/in/robinmoffattThanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of the Data Engineering Central Podcast, I sit down with R. Tyler Croy for a wide-ranging conversation on the present—and future—of modern data platforms.Tyler is a long-time open-source contributor to projects such as delta-rs. You can watch him on YouTube, read his blog, or work directly with him through his consultancy, Buoyant Data.Tyler has spent years deep in the open-source data ecosystem, contributing to projects such as Delta Lake and thinking critically about how real-world data systems are built and maintained. This isn’t a hype-driven conversation—it’s a grounded discussion about what’s working, what’s breaking, and what’s coming next.We dig into:* What the Lakehouse architecture gets right—and where it still falls short* Why multimodal data (text, images, audio, video, embeddings) changes everything* How open table formats like Delta Lake fit into the next generation of data platforms* The growing gap between data tooling hype and day-to-day data engineering reality* What skills and architectural thinking will matter most for data engineers over the next decadeIf you’re building or operating modern data platforms—and trying to separate real signal from noise—this episode is for you.Thanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of the Data Engineering Central Podcast, I sit down with Hoyt Emerson, founder of The Full Data Stack and Early Signal, for a wide-ranging conversation on data, analytics, and creating content in the tech world.We talk candidly about:* What actually matters in modern data and analytics* Why so much “data content” misses the mark* The difference between noise and real signal* What works (and doesn’t) when building a technical audience* Writing, consistency, and credibility in the data space* Why opinions + experience beat trends and buzzwordsIf you’re a data engineer, analyst, or technologist who’s curious about both building better data systems and communicating ideas that resonate, this episode goes deep on the lessons learned from doing both.This is less about hacks—and more about craft, judgment, and long-term thinking.Thanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode of the Data Engineering Central Podcast, I sit down with Andy Leonard — someone who’s been building systems long before “data engineering” was even a job title.Andy’s career didn’t start in software at all. It started with physical circuits, literally wiring systems as an electrician, before moving into programming, databases, and eventually decades of hands-on data engineering work.This conversation isn’t about trends or hype cycles. It’s about how the fundamentals of data work have evolved, what hasn’t changed, and what you only learn after years of building, breaking, fixing, and rebuilding real systems.We talk about how the industry got here, how tools have changed, where they haven’t helped as much as advertised, and what newer data engineers can learn from a long, practical career spent close to the metal.If you’re interested in perspective, experience, and lessons earned the hard way — this one’s for you.Thanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
From DBA to Data Everything

From DBA to Data Everything

2026-01-1401:06:14

In this episode of the Data Engineering Central Podcast, I interview a Data OG, someone who’s been around the data space forever, and we talked about all things data, past, present, and future.I’m joined by Thomas Horton a longtime friend and one of the most well-rounded data professionals I know. Over the course of his career, Tom has worn just about every hat in data: developer, DBA, analyst, and everything in between. He’s lived through the era of on-prem databases, the rise of analytics, and the constant reinvention that defines modern data engineering today.We talk about what’s changed, what hasn’t, and why many of the “new” problems in data feel oddly familiar. We also dig into lessons learned the hard way, lessons that are just as relevant for early-career data engineers as they are for seasoned practitioners navigating today’s ever-expanding stacks.On a personal note, a huge portion of what I know about relational databases and analytics can be traced back to Tom. This conversation is part reflection, part history lesson, and part reality check on where the data industry is headed next.* If you’re interested in the past, present, and future of data—and what really matters beneath all the tooling, this is an episode you won’t want to miss.Thanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In this episode, I sit down with Scott Haines — O’Reilly author, Databricks MVP, and veteran of Yahoo, Nike, and Twilio — for a wide-ranging conversation on the real state of modern data engineering. We dig into open-source ecosystems, Lakehouse architectures, the evolution of Spark, streaming, what’s broken and what’s working in today’s data tooling, and the lessons Scott has learned scaling platforms at some of the biggest companies in the world.If you care about data engineering, architecture, OSS, or the future of the modern data stack, you’ll love this one.Thanks for reading Data Engineering Central! This post is public so feel free to share it.Make sure to follow Scott here on Substack, and over on GitHub. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
Hello! A new episode of the Data Engineering Central Podcast is dropping today. We will be covering a few hot topics!* Cluster Fatigue* The Death of Open SourceGoing to be a great show, come along for the ride!Thanks for reading Data Engineering Central! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
This is a free preview of a paid episode. To hear more, visit dataengineeringcentral.substack.comHello! A new episode of the Data Engineering Central Podcast is dropping today, we will be covering a few hot topics!* Apache Iceberg Catalogs* new Boring Catalog* new full Iceberg support from Databricks/Unity Catalog* Databricks SQL Scripting* DuckDB coming to a Lake House near you* Lakebase from DatabricksGoing to be a great show, come along for the ride!Thanks …
Apache Iceberg Rant.

Apache Iceberg Rant.

2025-05-2611:00

Hello, my fair-weathered friends and readers! I am gone on vacation this week with my family, probably at this moment lying in the sand on a beach (Lord willing the creek don’t rise), not thinking of you all.Anywho, be that as it may, I didn’t want you to miss my pretty face, so here is a video of me ranting about Apache Iceberg, something I’ve had a lot of practice doing and enjoy quite thoroughly.For all you free-loaders out there, you can get 20% off to celebrate Memorial Day.https://dataengineeringcentral.substack.com/Merica This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
This is a free preview of a paid episode. To hear more, visit dataengineeringcentral.substack.comIt’s time for another episode of the Data Engineering Central Podcast. In this episode, we cover …* Rust-based tool called UV to replace pip and poetry etc* Apache X-Table and the Future of the Lake House* How is AI going to affect you?Thanks for being a consumer of Data Engineering Central; your support means a lot. Please share this podcast with your friend…
It’s time for another episode of the Data Engineering Central Podcast. In this episode, we cover …* AWS Lambda + DuckDB and Delta Lake (Polars, Daft, etc).* IAC - Long Live Terraform.* Databricks Data Quality with DQX.* Unity Catalog releases for DuckDB and Polars* Bespoke vs Managed Data Platforms* Delta Lake vs. Iceberg and UinFORM for a single table.Thanks for b… This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In todays episode of Data Engineering Central Podcast we talk about a few hot topics, AWS S3 Tables, Databricks raising money, are Data Contracts Dead, and the Lake House Storage Format battle!It's a good one, buckle up! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
It’s time for another episode of the Data Engineering Central Podcast. In this episode we cover …* Apache Airflow vs Databricks Workflows* End-of-Year Engineering Planning for 2025* 10 Billion Row Challenge with DuckDB vs Daft vs Polars* Raw Data Ingestion.As usual, the full episode is available to paid subscribers, and a shortened version to you free loaders out there, don’t worry, I still love you though. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
It’s time for another episode of Data Engineering Central Podcast, our third one! Topics in this episode …* Should you use DuckDB or Polars?* Small Engineering Changes (PR Reviews)* Daft vs Spark on Databricks with Unity Catalog (Delta Lake)* Primary and Foreign keys in the Lake HouseEnjoy! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
loading
Comments