AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs
Description
Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.
Measuring and Reducing Hazardous Knowledge
The recent White House Executive Order on Artificial Intelligence highlights risks of LLMs in facilitating the development of bioweapons, chemical weapons, and cyberweapons.
To help measure these dangerous capabilities, CAIS has partnered with Scale AI to create WMDP: the Weapons of Mass Destruction Proxy, an open source benchmark with more than 4,000 multiple choice questions that serve as proxies for hazardous knowledge across biology, chemistry, and cyber.
This benchmark not only helps the world understand the relative dual-use capabilities of different LLMs, but it also creates a path forward for model builders to remove harmful information from their models through machine unlearning techniques.
<picture></picture>Measuring hazardous knowledge in bio, chem, and cyber. Current evaluations of dangerous AI capabilities have [...]
---
Outline:
(00:03 ) Measuring and Reducing Hazardous Knowledge
(04:35 ) Language models are getting better at forecasting
(07:51 ) Proposals for Private Regulatory Markets
(14:25 ) Links
---
First published:
March 7th, 2024
Source:
https://newsletter.safe.ai/p/ai-safety-newsletter-32-measuring
---
Want more? Check out our ML Safety Newsletter for technical safety research.
Narrated by TYPE III AUDIO.