AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

Update: 2024-03-07

Description

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Measuring and Reducing Hazardous Knowledge

The recent White House Executive Order on Artificial Intelligence highlights risks of LLMs in facilitating the development of bioweapons, chemical weapons, and cyberweapons.

To help measure these dangerous capabilities, CAIS has partnered with Scale AI to create WMDP: the Weapons of Mass Destruction Proxy, an open source benchmark with more than 4,000 multiple choice questions that serve as proxies for hazardous knowledge across biology, chemistry, and cyber.

This benchmark not only helps the world understand the relative dual-use capabilities of different LLMs, but it also creates a path forward for model builders to remove harmful information from their models through machine unlearning techniques.

Measuring hazardous knowledge in bio, chem, and cyber. Current evaluations of dangerous AI capabilities have [...]

---

Outline:

(00:03 ) Measuring and Reducing Hazardous Knowledge

(04:35 ) Language models are getting better at forecasting

(07:51 ) Proposals for Private Regulatory Markets

(14:25 ) Links

---

First published:

March 7th, 2024

Source:

https://newsletter.safe.ai/p/ai-safety-newsletter-32-measuring

---

Want more? Check out our ML Safety Newsletter for technical safety research.

Narrated by TYPE III AUDIO.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

AISN #44: The Trump Circle on AI Safety

2024-11-1911:22

AISN #43: White House Issues First National Security Memo on AI

2024-10-2814:55

AISN #42: Newsom Vetoes SB 1047

2024-10-0113:11

AISN #41: The Next Generation of Compute Scale

2024-09-1111:59

AISN #40: California AI Legislation

2024-08-2114:00

AISN #39: Implications of a Trump Administration for AI Policy

2024-07-2912:00

AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI

2024-07-0910:31

AISN #37: US Launches Antitrust Investigations

2024-06-1811:02

AISN #36: Voluntary Commitments are Insufficient

2024-05-3010:09

AISN #35: Lobbying on AI Regulation

2024-05-1612:09

AISN #34: New Military AI Systems

2024-05-0117:02

AISN #33: Reassessing AI and Biorisk

2024-04-1120:27

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

2024-03-0717:56

AISN #31: A New AI Policy Bill in California

2024-02-2113:24

AISN #30: Investments in Compute and Military AI

2024-01-2411:25

AISN #29: Progress on the EU AI Act

2024-01-0412:14

The Landscape of US AI Legislation

2023-12-2909:57

AISN #28: Center for AI Safety 2023 Year in Review

2023-12-2111:08

AISN #27: Defensive Accelerationism

2023-12-0712:10

AISN #26: National Institutions for AI Safety

2023-11-1512:29

00:00

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

#box-pro-ellipsis-173224041051970{-webkit-line-clamp:2;}AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs