Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

Update: 2025-11-07

Description

Learn how Anyscale's Ray platform enables companies like Instacart to supercharge their model training while Amazon saves heavily by shifting to Ray's multimodal capabilities.

Topics Include:

Ray originated at UC Berkeley when PhD students spent more time building clusters than ML models
Anyscale now launches 1 million clusters monthly with contributions from OpenAI, Uber, Google, Coinbase
Instacart achieved 10-100x increase in model training data using Ray's scaling capabilities
ML evolved from single-node Pandas/NumPy to distributed Spark, now Ray for multimodal data
Ray Core transforms simple Python functions into distributed tasks across massive compute clusters
Higher-level Ray libraries simplify data processing, model training, hyperparameter tuning, and model serving
Anyscale platform adds production features: auto-restart, logging, observability, and zone-aware scheduling
Unlike Spark's CPU-only approach, Ray handles both CPUs and GPUs for multimodal workloads
Ray enables LLM post-training and fine-tuning using reinforcement learning on enterprise data
Multi-agent systems can scale automatically with Ray Serve handling thousands of requests per second
Anyscale leverages AWS infrastructure while keeping customer data within their own VPCs
Ray supports EC2, EKS, and HyperPod with features like fractional GPU usage and auto-scaling

Participants:

Sharath Cholleti – Member of Technical Staff, Anyscale

See how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/

Comments

In Channel

Ep183: Agentic AI - From hype to business impact

2025-12-1633:21

Ep182: The Agent Cloud - Rubrik's Solution for AI Governance and Recovery

2025-12-1023:47

Ep181: AWS Agentic AI Marketplace as your Growth Multiplier with Atlassian and Netskope

2025-12-0833:37

Ep180: Agent-Based Learning Systems - How Scale AI Transforms Enterprise Knowledge into Business Value

2025-12-0426:58

Ep179: How AI is Changing Everything for All of Us – McKinsey & Company's Lareina Yee on the new software innovator’s dilemma

2025-12-0232:47

Ep178: Agents meet SaaS - Inside the next generation of software delivery

2025-12-0143:53

Ep177: Agentic AI for business transformation with Boomi, Demandbase and Smarsh

2025-11-2844:23

Ep176: Global Innovation Leadership: The European Software Company Playbook with Boardwave and AWS

2025-11-2627:38

Ep175: Agentic AI - From Hype to Business Impact with Kore AI & SS&C Blue Prism

2025-11-2432:40

Ep174: How Voice AI Is Transforming Industries with Deepgram

2025-11-2036:15

Ep173: Simplifying Elasticsearch at Scale: How Elastic Built Their Serverless Platform

2025-11-1932:08

Ep172: Scaling AI from Day One: SS&C Blue Prism on Governance, Guardrails, and Growth

2025-11-1731:55

Ep171: Responsible Business Innovation with Generative AI

2025-11-1439:52

Ep170: From Open Source to AI Agents – Inside SnapLogic’s Transformation with AWS

2025-11-1229:57

Ep169: Connecting Everything Everywhere: How Boomi Powers Enterprise Integration with AWS

2025-11-1129:19

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

2025-11-0724:18

Ep167: Leveraging Amazon Bedrock and Agents for Accelerating Innovation and Engineering with Trellix

2025-11-0515:10

Ep166: It’s the end of observability as we know it with Honeycomb

2025-11-0322:28

Ep165: Siteimprove + Bedrock Agents - Powering Accessibility at Scale

2025-10-3033:02

Ep164: From Regulatory Burden to Business Advantage: How Archer is conquering regulatory change and compliance with Amazon Bedrock

2025-10-2927:35

00:00

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

#box-pro-ellipsis-176623508920656{-webkit-line-clamp:2;}Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale