AI Safety: Constitutional AI vs Human Feedback

Update: 2024-06-17

Description

With great power comes great responsibility. How do leading AI companies implement safety and ethics as language models scale? OpenAI uses Model Spec combined with RLHF (Reinforcement Learning from Human Feedback). Anthropic uses Constitutional AI. The technical approaches to maximizing usefulness while minimizing harm. Solo episode on AI alignment.

REFERENCE

OpenAI Model Spec

https://cdn.openai.com/spec/model-spec-2024-05-08.html#overview

Anthropic Constitutional AI

https://www.anthropic.com/news/claudes-constitution

To stay in touch, sign up for our newsletter at https://www.superprompt.fm

Comments

In Channel

AI Agents at Work: Scaffold Required

2025-12-0339:59

Whose Agent Is It Anyways?

2025-11-0715:50

AI Safety: Constitutional AI vs Human Feedback

2024-06-1716:38

Open Source LLMs: How Open Is "Open"?

2024-06-1013:28

Open Source AI: The Safety Debate

2024-06-0316:29

LLM Benchmarks: How to Know Which AI Is Better

2024-05-2710:35

Multimodal AI: When ChatGPT Learned to See

2024-05-2010:00

Google Gemini: Three Models, One Strategy

2024-05-1307:11

Building Custom GPTs: Seven Lessons from GPT Builder

2023-12-1525:56

Enterprise LLMs: Cloud Deployment Strategy

2023-11-0657:35

ChatGPT in the Classroom: Yale's Response

2023-10-2301:34:11

AI Screenwriting: GPT-4 Meets the Writers Strike

2023-08-1433:09

ChatGPT Guardrails: The One Question It Won't Debate

2023-07-2419:31

ChatGPT vs The Onion: Can AI Get the Joke?

2023-07-0815:55

ChatGPT Jailbreaks: The Grandma Exploit

2023-07-0323:44

AI Hallucinations: Bug or Feature?

2023-06-1923:07

LLM Training: Superman's Kryptonite-Proof Suit

2023-05-2918:56

Large Language Models: Getting from GPT-3 to chatGPT

2023-05-1522:17

What Is ChatGPT? Explained

2023-05-0825:21

DALL-E: Why AI Can't Make Your Perfect Pizza

2023-03-2451:43

00:00

1.0x

AI Safety: Constitutional AI vs Human Feedback

#box-pro-ellipsis-17668146285797{-webkit-line-clamp:2;}AI Safety: Constitutional AI vs Human Feedback

AI Safety: Constitutional AI vs Human Feedback

Tony Wan

AI Safety: Constitutional AI vs Human Feedback