Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Update: 2024-06-14

Description

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater.

Orchestra is a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack.

The core architecture involves users building pipelines as code which then run on Orchestra's serverless infrastructure. It can orchestrate tasks like data ingestion, transformation, AI calls, as well as monitoring and getting analytics on data products. All with end-to-end visibility, data lineage and governance even when organizations have a scattered, modular data architecture across teams and tools.

Key Quotes:

Find the right level of abstraction when building data orchestration tasks/workflows.
"I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.”

Modularize data pipeline components:
"It's just around understanding what that dev workflow should look like. I think it should be a bit more modular."
Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability.

Adopt a streaming/event-driven architecture for low-latency AI use cases:
"If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices."

Hugo Lu:

Newsletter

Orchestra

Orchestra Docs

Nicolay Gerold:

⁠LinkedIn⁠

⁠X (Twitter)

00:00 Introduction to Orchestra and its Focus on Data Products

08:03 Unified Control Plane for Data Stack and End-to-End Control

14:42 Use Cases and Unique Applications of Orchestra

19:31 Retaining Existing Dev Workflows and Best Practices in Orchestra

22:23 Event-Driven Architectures and Monitoring in Orchestra

23:49 Putting Data Products First and Monitoring Health and Usage

25:40 The Future of Data Orchestration: Stream-Based and Cost-Effective

data orchestration, Orchestra, serverless architecture, versatility, use cases, maturity levels, challenges, AI workloads

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

Limits of Embeddings: Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) | S2 E5

2024-09-1946:06

RAG at Scale: The problems you will encounter and how to prevent (or fix) them | S2 E4

2024-09-1250:09

From Keywords to AI (to GAR): The Evolution of Search, Finding Search Signals | S2 E3

2024-09-0552:16

Data-driven Search Optimization, Analysing Relevance | S2 E2

2024-08-3051:14

Query Understanding: Doing The Work Before The Query Hits The Database | S2 E1

2024-08-1553:02

Season 2 Trailer: Mastering Search

2024-08-0804:16

Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17

2024-07-1636:28

Data Processing for AI, Integrating AI into Data Pipelines, Spark | ep 16

2024-07-1246:26

Building AI Agents for the Enterprise: Realistic Use Cases, Cost Controls, Seamless UX | ep 15

2024-07-0435:12

Building Predictable Agents: Prompting, Compression, and Memory Strategies | ep 14

2024-06-2732:14

Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3

2024-06-2514:53

ETL for LLMs, Integrating and Normalizing Unstructured Data | ep 13

2024-06-1936:48

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

2024-06-1428:06

Mastering Vector Databases: Product & Binary Quantization, Multi-Vector Search

2024-06-0740:06

Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage | ep 10

2024-05-3145:33

Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack | ep 9

2024-05-2427:53

Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models | ep 8

2024-05-2036:40

Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture | ep 7

2024-05-1738:12

Data Orchestration Tools: Choosing the right one for your needs | ep 6

2024-05-1032:37

Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals | ep 5

2024-05-0329:40

00:00

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

#box-pro-ellipsis-172690462768599{-webkit-line-clamp:2;}Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Nicolay Gerold

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12