From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

Update: 2024-10-09

Description

In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations.

Arize AI is a startup that is one of the leaders in AI observability and LLM evaluation.

It's the same company behind the very popular open-source evaluation project, Phoenix.

Prior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul

where he scaled the business into a public company that was eventually acquired by Adobe.

SHOW TOPICS:

Jason’s background and key moments in his career
Arize AI's founding journey and focus on observability and evaluation
Primary challenges of evaluating GenAI and foundational models
Using LLM / AI as-a-judge
Common mistakes to avoid when evaluating LLMs
Evaluation-driven development.
AI agents, agentic AI, and challenge for evaluation
Breaking down AI agents into manageable components.
Agent Control Flow and assessing how agents make correct decisions at each step
Evaluating individual actions performed by AI agents
Retrieval Augmented Generation (RAG) evaluation
Ensuring RAG retrieved information is accurate and relevant
Risks and benefits of using open-source models vs. proprietary models,
Large Language Model evaluation metrics
The drawbacks of public benchmarks
Practical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production
The advantages of SLMs (Small language Models)
Building an LLM task evaluation from scratch, the steps involved

SHOW NOTES

- Jason Lopatecki, CEO and Co-Founder of Arize AI: ⁠https://www.linkedin.com/in/jason-lopatecki-9509941⁠

⁠https://twitter.com/jason_lopatecki⁠

Arize AI: ⁠https://twitter.com/arizeai⁠

- Arize AI blogs ⁠https://arize.com/blog/⁠

- Jason’s Talk at ODSC West - Demystifying LLM Evaluation - ⁠https://odsc.com/speakers/demystifying-llm-evaluation/⁠

- Foundational Models https://en.wikipedia.org/wiki/Foundation_model

- AI Agents ⁠https://en.wikipedia.org/wiki/Intelligent_agent⁠

- Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/

- Prometheus: Inducing Fine-grained Evaluation Capability in Language Models ⁠https://arxiv.org/abs/2310.08491⁠

- Open LLM Leaderboard ⁠https://huggingface.co/open-llm-leaderboard⁠

- OpenAI o1 ⁠https://openai.com/o1/⁠

- Mistral LLMs ⁠https://docs.mistral.ai/getting-started/models/models_overview/⁠

- Llama 3.2 ⁠https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/⁠

- Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/

Phoenix - Open Source AI Observability & Evaluation -⁠https://github.com/Arize-ai/phoenix⁠

This episode was sponsored by:

Ai+ Training⁠ ⁠https://aiplus.training/⁠⁠

Home to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering.

And created in partnership with ODSC⁠ ⁠https://odsc.com/⁠⁠

The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps

Join us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

Getting Hugs, Fine-Tuning, and Avoiding the AI API Dependency Trap with Hugging Face’s Jeff Boudier

2024-10-1649:07

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

2024-10-0944:25

Google's AI-Powered Tools for Data Scientists: Building the Automated Future of Data Science with Paige Bailey

2024-10-0154:39

AI for Robotics and Autonomy with Francis X. Govers III

2024-09-2501:08:52

Reinforcement Learning for Finance with Dr. Yves J. Hilpisch

2024-09-1801:07:54

From Automated Prompt Engineering to Deepfake Speech: AI Advancements and Ethical Challenges with Dr. Julie Wall

2024-09-1854:12

What Gaming Teaches Us About Generative AI: A Conversation with Hilary Mason

2024-08-2958:01

Predicting AI’s Impact on Work with Sam Manning

2024-08-2252:09

Preventing and Mitigating Damage from Real-Time Deepfakes with Ben Colman

2024-08-1438:09

Generative AI: Best Practices from Top Companies on GenAI Production Deployment with Lukas Biewald

2024-07-3144:57

On Learning-Aware Mechanism Design with Michael I. Jordan, PhD

2024-07-2453:05

The Future of Databases and Raising Startup Funding with Mike Stonebraker

2024-07-1601:14:50

Small Language Models with Luca Antiga

2024-07-0959:53

World Models - A Deep Dive With Andre Franca

2024-06-2649:10

How to Evaluate Large Language Models and RAG Applications with Pasquale Antonante

2024-06-1952:34

Strategies for Implementing AI Governance and AI Risk Management with Beatrice Botti

2024-06-1351:41

AI and Data Science in Financial Markets with Iro Tasitsiomi

2024-06-0451:03

How AI will Impact the News with Noah Giansiracusa

2024-05-2901:01:07

HPCC - Open-Source Platform High-Performance Computing on Large-Scale Data with Bob Foreman

2024-05-2150:06

Training and Deploying Open-Source LLMs with Dr. Jon Krohn

2024-05-1751:09

00:00

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

#box-pro-ellipsis-172923269388076{-webkit-line-clamp:2;}From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki