Category: Uncategorized
-
Why Batch Non-Invariance of Sequence Model Outputs Is Correct
The same prompt, temperature=0, and fixed seed can still yield different text depending on whether the server runs the request alone or batches it with others. That looks like a bug to many operators. This post explains why the behavior can be correct, and what the vLLM maintainers and issue authors found when they investigated.…