15: InstructGPT

Update: 2023-03-28

Description

In this episode we discuss the paper "Training language models to follow instructions with human feedback" by Ouyang et al (2022). We discuss the RLHF paradigm and how important RL is to tuning GPT.

Comments

In Channel

LoRA

2023-09-0201:02:56

15: InstructGPT

2023-03-2857:27

14: Whisper

2023-03-1749:14

13: AlphaTensor

2023-03-1149:05

12: SIRENs

2022-10-2554:17

11: CVPR Workshop on Autonomous Driving Keynote by Ashok Elluswamy, a Tesla engineer

2022-09-3048:51

10: Outracing champion Gran Turismo drivers with deep reinforcement learning

2022-08-2354:50

8: GATO (A Generalist Agent)

2022-07-2944:51

7: Deep Unsupervised Learning Using Nonequilibrium Thermodynamics (Diffusion Models)

2022-06-1430:55

6: Deep Reinforcement Learning at the Edge of the Statistical Precipice

2022-06-0601:01:08

5: QMIX

2022-04-2642:06

4: Can Neural Nets Learn the Same Model Twice?

2022-04-0655:23

3: VICReg

2022-03-2144:46

2: data2vec

2022-03-0753:23

1: Reward is Enough

2022-02-2154:36

Mixture of Experts

2024-10-0854:46

9: Heads-Up Limit Hold'em Poker Is Solved

2022-07-2947:55

00:00

1.0x

15: InstructGPT

Vahe Hagopian, Taka Hasegawa, Farrukh Rahman

#box-pro-ellipsis-176355408547870{-webkit-line-clamp:2;}15: InstructGPT

15: InstructGPT

Vahe Hagopian, Taka Hasegawa, Farrukh Rahman

15: InstructGPT