DiscoverInterconnects AudioInterviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between
Interviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between

Interviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between

Update: 2024-03-04
Share

Description

Louis recently has been founding a new startup focused on synthetic data for alignment, Synth Labs, and is a researcher at Eleuether AI. This interview should speak for itself, and it’ll need re-listens, even for myself. The list of topics we cover touches on pretty much every major and minor issue facing model fine-tuning. Please reach out or comment if there’s a paper we mention that I didn’t link before. Happy to dig it up for you. This post is very technical. If you’re having a hard time with it, I suggest you listen to my RLHF 201 post on Latent Space first.

Full transcript available here: https://www.interconnects.ai/p/rlhf-interview-1-louis

  • 00:00:00 : Introduction
  • 00:01:24 : Gemini News and RLHF’s Part in it
  • 00:09:05 : Long Context, In-Context, and Multimodal RLHF
  • 00:21:20 : What are people missing about RLHF these days?
  • 00:30:30 : OpenAI's Influence and the Need for Alternatives
  • 00:39:20 : Synth Labs and the Future of Alignment
  • 00:55:00 : Evaluation Talk p2: Open-ended Evaluation and Data Diversity
  • 00:59:20 : Algorithm Roundup: PPO, DPO, KTO, IPO
  • 01:18:38 : CarperAI, Early Days of RLHF, Reflecting on ChatGPT
Comments 
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Interviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between

Interviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between

Nathan Lambert