HACKATHON: Evals November 2023 (1)
Update: 2024-01-08
Description
This episode kicks off our first subseries, which will consist of recordings taken during my team's meetings for the AlignmentJams Evals Hackathon in November of 2023. Our team won first place, so you'll be listening to the process which, at the end of the day, turned out to be pretty good.
Check out Apart Research, the group that runs the AlignmentJamz Hackathons.
Links to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.
- Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
- Discovering Language Model Behaviors with Model-Written Evaluations
- OpenAI Evals GitHub
- METR (previously ARC Evals)
- Goodharting on Wikipedia
- From Instructions to Intrinsic Human Values, a Survey of Alignment Goals for Big Models
- Fine Tuning Aligned Language Models Compromises Safety Even When Users Do Not Intend
- Shadow Alignment: The Ease of Subverting Safely Aligned Language Models
- Will Releasing the Weights of Future Large Language Models Grant Widespread Access to Pandemic Agents?
- Building Less Flawed Metrics, Understanding and Creating Better Measurement and Incentive Systems
- eLeutherAI's Model Evaluation Harness
- Evalugator Library
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
In Channel