Model Evaluation for Extreme Risks

Update: 2023-05-13

Description

Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through “dangerous capability evaluations”) and the propensity of models to apply their capabilities for harm (through “alignment evaluations”). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.

Source:

https://arxiv.org/pdf/2305.15324.pdf

Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

My Current Impressions on Career Choice for Longtermists

2023-05-1347:26

List of EA Funding Opportunities

2023-05-1312:12

China-Related AI Safety and Governance Paths

2023-05-1347:41

Career Resources on AI Strategy Research

2023-05-1318:01

AI Governance Needs Technical Work

2023-05-1315:06

Some Talent Needs in AI Governance

2023-05-1315:46

Driving U.S. Innovation in Artificial Intelligence: A Roadmap for Artificial Intelligence Policy in the United States Senate

2024-05-2236:13

Societal Adaptation to Advanced AI

2024-05-2046:06

Model Evaluation for Extreme Risks

2023-05-1356:18

Primer on AI Chips and AI Governance

2023-05-1325:10

12 Tentative Ideas for Us AI Policy

2023-05-1309:49

Let’s Think About Slowing Down AI

2023-05-1301:14:59

What AI Companies Can Do Today to Help With the Most Important Century

2023-05-1318:27

OpenAI Charter

2023-05-1302:51

LP Announcement by OpenAI

2023-05-1306:35

International Institutions for Advanced AI

2023-05-1342:13

OECD AI Principles

2024-05-1323:34

Key facts: UNESCO’s Recommendation on the Ethics of Artificial Intelligence

2024-05-1320:55

The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023

2024-05-1308:33

A pro-innovation approach to AI regulation: government response

2024-05-1338:24

00:00

Model Evaluation for Extreme Risks

#box-pro-ellipsis-173479825680627{-webkit-line-clamp:2;}Model Evaluation for Extreme Risks

Model Evaluation for Extreme Risks

BlueDot Impact

Model Evaluation for Extreme Risks