DiscoverAI Safety NewsletterAISN #32: Measuring and Reducing Hazardous Knowledge in LLMs
AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

Update: 2024-03-07
Share

Description

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Measuring and Reducing Hazardous Knowledge

The recent White House Executive Order on Artificial Intelligence highlights risks of LLMs in facilitating the development of bioweapons, chemical weapons, and cyberweapons.

To help measure these dangerous capabilities, CAIS has partnered with Scale AI to create WMDP: the Weapons of Mass Destruction Proxy, an open source benchmark with more than 4,000 multiple choice questions that serve as proxies for hazardous knowledge across biology, chemistry, and cyber.

This benchmark not only helps the world understand the relative dual-use capabilities of different LLMs, but it also creates a path forward for model builders to remove harmful information from their models through machine unlearning techniques.

<picture></picture>

Measuring hazardous knowledge in bio, chem, and cyber. Current evaluations of dangerous AI capabilities have [...]

---

Outline:

(00:03 ) Measuring and Reducing Hazardous Knowledge

(04:35 ) Language models are getting better at forecasting

(07:51 ) Proposals for Private Regulatory Markets

(14:25 ) Links

---


First published:

March 7th, 2024



Source:

https://newsletter.safe.ai/p/ai-safety-newsletter-32-measuring


---

Want more? Check out our ML Safety Newsletter for technical safety research.



Narrated by TYPE III AUDIO.

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs