Scaling laws for reward model overoptimization