DiscoverODSC's Ai X PodcastFrom LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki
From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

Update: 2024-10-09
Share

Description

In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. 

Arize AI is a startup that is one of the leaders in AI observability and LLM evaluation. 

It's the same company behind the very popular open-source evaluation project, Phoenix.

Prior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul 

where he scaled the business into a public company that was eventually acquired by Adobe.


SHOW TOPICS:

  • Jason’s background and key moments in his career
  • Arize AI's founding journey and focus on observability and evaluation  
  • Primary challenges of evaluating GenAI and foundational models
  • Using LLM / AI as-a-judge
  • Common mistakes to avoid when evaluating LLMs
  • Evaluation-driven development. 
  • AI agents, agentic AI, and challenge for evaluation
  • Breaking down AI agents into manageable components.  
  • Agent Control Flow and assessing how agents make correct decisions at each step 
  • Evaluating individual actions performed by AI agents
  • Retrieval Augmented Generation (RAG) evaluation  
  • Ensuring RAG retrieved information is accurate and relevant 
  • Risks and benefits of using open-source models vs. proprietary models,  
  • Large Language Model evaluation metrics  
  • The drawbacks of public benchmarks 
  • Practical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production
  • The advantages of SLMs (Small language Models) 
  • Building an LLM task evaluation from scratch, the steps involved

SHOW NOTES

- Jason Lopatecki, CEO and Co-Founder of Arize AI: ⁠https://www.linkedin.com/in/jason-lopatecki-9509941⁠

⁠https://twitter.com/jason_lopatecki⁠

Arize AI: ⁠https://twitter.com/arizeai⁠

- Arize AI blogs ⁠https://arize.com/blog/⁠

- Jason’s Talk at ODSC West - Demystifying LLM Evaluation - https://odsc.com/speakers/demystifying-llm-evaluation/

- Foundational Models https://en.wikipedia.org/wiki/Foundation_model

- AI Agents ⁠https://en.wikipedia.org/wiki/Intelligent_agent⁠

- Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/

- Prometheus: Inducing Fine-grained Evaluation Capability in Language Models ⁠https://arxiv.org/abs/2310.08491⁠

- Open LLM Leaderboard ⁠https://huggingface.co/open-llm-leaderboard⁠

- OpenAI o1 ⁠https://openai.com/o1/⁠

- Mistral LLMs ⁠https://docs.mistral.ai/getting-started/models/models_overview/⁠

- Llama 3.2 ⁠https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/⁠

- Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/

 Phoenix - Open Source AI Observability & Evaluation -⁠https://github.com/Arize-ai/phoenix⁠

This episode was sponsored by:  

Ai+ Training https://aiplus.training/⁠⁠ 

Home to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering.

And created in partnership with ODSC https://odsc.com/⁠⁠ 

The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps

Join us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki