Community, Language, Reasoning, Research, Responsible AI, Safety & Alignment, Video generation

Language models can explain neurons in language models

We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.

Written by: Elis Wanyama
Posted on: April 19, 2024

Language models can explain neurons in language models

Let's Talk?

Let's Talk?

Phone.

Email.