DiscoverInterconnects AudioEvaluations: Trust, performance, and price (bonus, announcing RewardBench)
Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

Update: 2024-03-21
Share

Description

Evaluation is not only getting harder with modern LLMs, it's getting harder because it means something different.
This is AI generated audio with Python and 11Labs. Music generated by Meta's MusicGen.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/evaluations-trust-performance-and-price

00:00 Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
03:14 The rising price of evaluation
05:40 Announcing RewardBench: The First reward model evaluation tool
08:37 Updates to RLHF evaluation tools

YouTube code intro: https://youtu.be/CAaHAfCqrBA

Figure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_026.png
Figure 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_030.png
Figure 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_034.png
Figure 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_040.png

Comments 
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

Nathan Lambert