DiscoverAWS for Software Companies PodcastEp168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale
Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

Update: 2025-11-07
Share

Description

** AWS re:Invent 2025 Dec 1-5, Las Vegas - Register Here! **

Learn how Anyscale's Ray platform enables companies like Instacart to supercharge their model training while Amazon saves heavily by shifting to Ray's multimodal capabilities.

Topics Include:

  • Ray originated at UC Berkeley when PhD students spent more time building clusters than ML models
  • Anyscale now launches 1 million clusters monthly with contributions from OpenAI, Uber, Google, Coinbase
  • Instacart achieved 10-100x increase in model training data using Ray's scaling capabilities
  • ML evolved from single-node Pandas/NumPy to distributed Spark, now Ray for multimodal data
  • Ray Core transforms simple Python functions into distributed tasks across massive compute clusters
  • Higher-level Ray libraries simplify data processing, model training, hyperparameter tuning, and model serving
  • Anyscale platform adds production features: auto-restart, logging, observability, and zone-aware scheduling
  • Unlike Spark's CPU-only approach, Ray handles both CPUs and GPUs for multimodal workloads
  • Ray enables LLM post-training and fine-tuning using reinforcement learning on enterprise data
  • Multi-agent systems can scale automatically with Ray Serve handling thousands of requests per second
  • Anyscale leverages AWS infrastructure while keeping customer data within their own VPCs
  • Ray supports EC2, EKS, and HyperPod with features like fractional GPU usage and auto-scaling


Participants:


See how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale

Ep168: Scaling Agentic Workloads: Why Reliable Infrastructure is Non-Negotiable for Enterprise AI by Anyscale