DiscoverThe Nonlinear Library: Alignment ForumAF - SAE-VIS: Announcement Post by CallumMcDougall
AF - SAE-VIS: Announcement Post by CallumMcDougall

AF - SAE-VIS: Announcement Post by CallumMcDougall

Update: 2024-03-31
Share

Description

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SAE-VIS: Announcement Post, published by CallumMcDougall on March 31, 2024 on The AI Alignment Forum.

This is a post to officially announce the sae-vis library, which was designed to create feature dashboards like those from Anthropic's research.

Summary

There are 2 types of visualisations supported by this library: feature-centric and prompt-centric.

The feature-centric vis is the standard from Anthropic's post, it looks like the image below. There's an option to navigate through different features via a dropdown in the top left.

The prompt-centric vis is centred on a single user-supplied prompt, rather than a single feature. It will show you the list of features which score highest on that prompt, according to a variety of different metrics. It looks like the image below. There's an option to navigate through different possible metrics and choices of token in your prompt via a dropdown in the top left.

Other links

Here are some more useful links:

GitHub repo

User Guide - Google Doc explaining how to use the library

Dev Guide - Google Doc explaining more about how the library was built, for if you'd like to try and extend it / build off it

Demo Colab - includes examples, with code explained

You might also be interested in reading about Neuronpedia, who make use of this library in their visualizations.

If you're interested in getting involved, please reach out to me or Joseph Bloom! We will also be publishing a post tomorrow, discussing some of the features we've discovered during our research.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

AF - SAE-VIS: Announcement Post by CallumMcDougall

AF - SAE-VIS: Announcement Post by CallumMcDougall

CallumMcDougall