Even if you pin the seed and spin up your own local LLM, changes to continuous b...

Even if you pin the seed and spin up your own local LLM, changes to continuous batching at the vLLM level or just a different CUDA driver version will completely break your bitwise float convergence. Reproducibility in ML generation is a total myth, in prod we only work with the final output anyway