DiscoverInterconnects AudioOpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions
OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

Update: 2024-05-13
Share

Description

Now we will have some grounding for when weird ChatGPT behaviors are intended or side-effects -- shrinking the Overton window of RLHF bugs.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/openai-rlhf-model-spec

00:00 OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions
02:56 Reviewing the Model Spec
08:26 Where RLHF can fail OpenAI
12:23 From Model Spec's to personalization

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_027.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_029.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_033.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_034.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_041.webp
Fig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_046.webp

Comments 
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

Nathan Lambert