That was my initial understanding, which left me confused.
But they're taking the top n according to the model, then taking the top according to the proxy, not actual, objective. This avoids the Winner's Curse problem of top model ranking with reasonable probability.
They are then comparing this to the highest scoring actual preference.