OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

Update: 2024-05-13

Description

Now we will have some grounding for when weird ChatGPT behaviors are intended or side-effects -- shrinking the Overton window of RLHF bugs.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/openai-rlhf-model-spec

00:00 OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions
02:56 Reviewing the Model Spec
08:26 Where RLHF can fail OpenAI
12:23 From Model Spec's to personalization

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_027.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_029.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_033.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_034.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_041.webp
Fig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_046.webp

Comments

In Channel

Switched to Claude 3.5

2024-07-0306:40

Interviewing Dean Ball on AI policy

2024-06-2756:31

RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

2024-06-2611:52

Frontiers in synthetic data

2024-06-2111:27

Text-to-video AI is already abundant

2024-06-1808:18

AI for the rest of us

2024-06-1212:35

A realistic path to robotic foundation models

2024-06-0507:49

We aren't running out of training data, we are running out of open training data

2024-05-2908:29

Name, image, and AI's likeness

2024-05-2209:03

OpenAI chases Her

2024-05-1612:28

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

2024-05-1314:05

RLHF: A thin line between useful and lobotomized

2024-05-0113:08

Phi 3 and Arctic: Outlier LMs are hints

2024-04-3009:46

AGI is what you want it to be

2024-04-2410:38

Llama 3: Scaling open LLMs to AGI

2024-04-2115:05

Stop "reinventing" everything to "solve" alignment

2024-04-1707:32

The end of the "best open LLM"

2024-04-1506:45

Why we disagree on what open-source AI should be

2024-04-0308:57

DBRX: The new best open LLM and Databricks' ML strategy

2024-03-2916:33

Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

2024-03-2112:40

00:00

1.0x

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

#box-pro-ellipsis-172025305946872{-webkit-line-clamp:2;}OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

Nathan Lambert

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions