Pre registration for ML benchmarks: hash the eval claim before you touch the test set