We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset
We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply
We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong