Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Update: 2024-12-17

Description

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Comments

In Channel

The Engineering Behind the World’s Most Advanced Video AI

2025-12-0114:50

The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava

2025-11-1859:13

The Startup Powering The Data Behind AGI

2025-09-1656:15

Arvind Jain on Building Glean and the Future of Enterprise AI

2025-08-0543:41

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

2025-07-0842:42

GitHub CEO Thomas Dohmke on Copilot and the Future of Software Development

2025-06-1001:09:44

From Pharma to AGI Hype, and Developing AI in Finance: Martin Shkreli’s Journey

2025-05-2001:30:19

Inside Cursor: The future of AI coding with Co-founder Sualeh Asif

2025-04-2949:36

Inside the Dark Web, AI and Cybersecurity with Christopher Ahlberg CEO of Recorded Future

2025-04-0850:15

AI, autonomy, and the future of naval warfare with Captain Jon Haase, United States Navy

2025-03-2501:01:32

The rise of AI agents

2025-02-2549:09

R1, OpenAI’s o3, and the ARC-AGI Benchmark: Insights from Mike Knoop

2025-02-0401:12:01

DeepSeek, Stargate and AI's $600 Billion Question with Sequoia's David Cahn

2025-01-2858:16

Building the future of collaborative AI development with Akshay Agrawal

2025-01-0741:03

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

2024-12-1755:32

AI’s breakthrough in weather forecasting with Brightband’s Julian Green

2024-11-2649:58

What’s the path to AGI? A conversation with Turing Co-founder and CEO Jonathan Siddharth

2024-11-0754:48

Vercel’s CEO & Founder Guillermo Rauch on the impact of AI on Web Development and Front End Engineering

2024-10-2456:57

Snowflake’s CEO Sridhar Ramaswamy on 700+ LLM enterprise use cases

2024-10-1055:42

Elevating ML Infrastructure with Modal Labs CEO Erik Bernhardsson

2024-09-2649:39

00:00

1.0x

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

#box-pro-ellipsis-176565604885722{-webkit-line-clamp:2;}Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez