DiscoverHow AI Is BuiltNavigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture | ep 7
Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture | ep 7

Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture | ep 7

Update: 2024-05-17
Share

Description

From Problem to Requirements to Architecture.


In this episode, Nicolay Gerold and Jon Erich Kemi Warghed discuss the landscape of data engineering, sharing insights on selecting the right tools, implementing effective data governance, and leveraging powerful concepts like software-defined assets. They discuss the challenges of keeping up with the ever-evolving tech landscape and offer practical advice for building sustainable data platforms. Tune in to discover how to simplify complex data pipelines, unlock the power of orchestration tools, and ultimately create more value from your data.



  • "Don't overcomplicate what you're actually doing."

  • "Getting your basic programming software development skills down is super important to becoming a good data engineer."

  • "Who has time to learn 500 new tools? It's like, this is not humanly possible anymore."


Key Takeaways:



  • Data Governance: Data governance is about transparency and understanding the data you have. It's crucial for organizations as they scale and data becomes more complex. Tools like dbt and Dagster can help achieve this.

  • Open Source Tooling: When choosing open source tools, assess their backing, commit frequency, community support, and ease of use.

  • Agile Data Platforms: Focus on the capabilities you want to enable and prioritize solving the core problems of your data engineers and analysts.

  • Software Defined Assets: This concept, exemplified by Dagster, shifts the focus from how data is processed to what data should exist. This change in mindset can greatly simplify data orchestration and management.

  • The Importance of Fundamentals: Strong programming and software development skills are crucial for data engineers, and understanding the basics of data management and orchestration is essential for success.

  • The Importance of Versioning Data: Data has to be versioned so you can easily track changes, revert to previous states if needed, and ensure reproducibility in your data pipelines. lakeFS applies the concepts of Git to your data lake. This gives you the ability to create branches for different development environments, commit changes to specific versions, and merge branches together once changes have been tested and validated.


Jon Erik Kemi Warghed:



Nicolay Gerold:



Chapters


00:00 The Problem with the Modern Data Stack: Too many tools and buzzwords


00:57 How to Choose the Right Tools: Considerations for startups and large companies


03:13 Evaluating Open Source Tools: Background checks and due diligence


07:52 Defining Data Governance: Transparency and understanding of data


10:15 The Importance of Data Governance: Challenges and solutions


12:21 Data Governance Tools: dbt and Dagster


17:05 The Impact of Dagster: Software-defined assets and declarative thinking


19:31 The Power of Software Defined Assets: How Dagster differs from Airflow and Mage


21:52 State Management and Orchestration in Dagster: Real-time updates and dependency management


26:24 Why Use Orchestration Tools?: The role of orchestration in complex data pipelines


28:47 The Importance of Tool Selection: Thinking about long-term sustainability


31:10 When to Adopt Orchestration: Identifying the need for orchestration tools

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture | ep 7

Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture | ep 7

Nicolay Gerold