DiscoverHow AI Is BuiltServerless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12
Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Update: 2024-06-14
Share

Description

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater.


Orchestra is a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack.


The core architecture involves users building pipelines as code which then run on Orchestra's serverless infrastructure. It can orchestrate tasks like data ingestion, transformation, AI calls, as well as monitoring and getting analytics on data products. All with end-to-end visibility, data lineage and governance even when organizations have a scattered, modular data architecture across teams and tools.


Key Quotes:



  • Find the right level of abstraction when building data orchestration tasks/workflows.
    "I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.”

  • Modularize data pipeline components:
    "It's just around understanding what that dev workflow should look like. I think it should be a bit more modular."
    Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability.

  • Adopt a streaming/event-driven architecture for low-latency AI use cases:
    "If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices."


Hugo Lu:



Nicolay Gerold:



00:00 Introduction to Orchestra and its Focus on Data Products


08:03 Unified Control Plane for Data Stack and End-to-End Control


14:42 Use Cases and Unique Applications of Orchestra


19:31 Retaining Existing Dev Workflows and Best Practices in Orchestra


22:23 Event-Driven Architectures and Monitoring in Orchestra


23:49 Putting Data Products First and Monitoring Health and Usage


25:40 The Future of Data Orchestration: Stream-Based and Cost-Effective


data orchestration, Orchestra, serverless architecture, versatility, use cases, maturity levels, challenges, AI workloads

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Nicolay Gerold