#1: Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作る
Update: 2024-09-08
Description
Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作るという論文、Judging LLM-as-a-Judge with MT-Bench and Chatbot Arenaを題材に話しました。
ポッドキャストの書き起こしサービス「LISTEN」はこちら
Shownotes:
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Chat with Open Large Language Models
From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline | LMSYS Org
Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge
https://x.com/karpathy/status/1737544497016578453
https://github.com/lm-sys/arena-hard-auto/tree/main/BenchBuilder
出演者:
seya(@sekikazu01)
kagaya(@ry0_kaga)
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
In Channel