top of page

Executive Summary

The research paper Defeating Nondeterminism in LLM Inference reveals that even when artificial intelligence systems like ChatGPT are set to produce identical outputs, they can still give different answers due to hidden variability in how cloud servers batch and process requests. This lack of consistency, called nondeterminism, undermines trust, auditability, and repeatability in AI-driven decisions. The authors show how to eliminate this issue by redesigning model processing to be “batch-invariant,” ensuring identical inputs always produce identical outputs. This breakthrough matters because it makes AI systems predictable, verifiable, and compliant, a crucial foundation for regulated industries, enterprise governance, and executive confidence in deploying AI responsibly at scale.

_____

Key point: This paper demonstrates that nondeterministic outputs in large language models arise from variable batching during inference, not randomness or hardware, and introduces batch-invariant processing to make AI results fully reproducible and trustworthy.

Defeating Nondeterminism in LLM Inference

No ratings yet
  • Overview of the Paper

    The research paper Defeating Nondeterminism in LLM Inference (Horace He, Thinking Machines Lab, 2025) investigates why large language model (LLM) inference is nondeterministic. That is, why the same model and prompt can produce slightly different results even when the temperature is set to zero. Contrary to the widespread “concurrency + floating point” hypothesis, which attributes nondeterminism to GPU parallelism and rounding errors, the author identifies the true culprit as batch-size-dependent kernel behavior. Variations in server load dynamically alter batch sizes during inference, subtly changing numerical operations and causing inconsistent outputs across identical queries.


    Key Contributions


    1. Reframing the Source of Nondeterminism. The paper proves that the main source of inference variation is not GPU concurrency but a lack of batch invariance in core operations (matrix multiplication, RMSNorm, and attention).


    2. Batch-Invariant Kernel Design. The authors introduce batch-invariant kernels for these operations, ensuring identical reduction orders regardless of batch size or system load, enabling bitwise reproducibility in LLM outputs.


    3. Demonstrated Deterministic Inference. Using the vLLM engine and Qwen3-235B-A22B-Instruct model, the paper shows that with deterministic kernels, 1,000 completions of the same prompt yield identical outputs, the first fully reproducible LLM inference demonstrated at scale.


    4. Implications for Reinforcement Learning (RL) Training. Deterministic inference allows for true on-policy RL training, eliminating drift between training and sampling policies that previously caused instability and reward collapse


    Significance of the Findings


    This work fundamentally changes how the AI field understands and manages nondeterminism. It provides a reproducibility standard that allows model developers to debug, benchmark, and compare LLMs with scientific precision. The findings also have practical implications for AI safety, regulatory compliance, and trust, as deterministic inference is essential for auditable AI systems and for ensuring consistent outputs in safety-critical contexts like finance, healthcare, and defence.


    Why It Matters

    For business leaders, this research highlights that AI reproducibility is not just a technical issue but a governance and reliability challenge. Inconsistent outputs across identical conditions erode trust in AI-driven decisions. By enabling deterministic inference, this work establishes the foundation for verifiable AI operations, stable model governance, and consistent customer experiences. It also supports the future of regulated AI development, where repeatability and transparency will be mandatory for certification and compliance in enterprise and sovereign AI systems.


    Reference

    He, H. (2025). Defeating Nondeterminism in LLM Inference. Thinking Machines Lab.

Community Rating

No ratings yet

Your Rating

You can rate each item only once.

Thanks! Your rating has been recorded.

Text

You must be a registered site member and logged in to submit a rating.

Share Your Experience

Share your tips, insights, and outcomes in the comments below to help others understand how this resource works in real teams.

You must be registered and logged in to submit comments and view member details.

Comments

Share Your ThoughtsBe the first to write a comment.

Copyright & Attribution. All summaries and analyses of this website directory are based on publicly available research papers from sources such as arXiv and other academic repositories, or website blogs if published only in that medium. Original works remain the property of their respective authors and publishers. Where possible, links to the original publication are provided for reference. This website provides transformative summaries and commentary for educational and informational purposes only. Research paper documents are retrieved from original sources and not hosted on this website. Any reuse of original research must comply with the licensing terms stated by the original source.

AI-Generated Content Disclaimer. Some or all content presented on this website directory, including research paper summaries, insights, or analyses, has been generated or assisted by artificial intelligence systems. While reasonable efforts are made to review and verify accuracy, the summaries may contain factual or interpretive inaccuracies. The summaries are provided for general informational purposes only and do not represent the official views of the paper’s authors, publishers, or any affiliated institutions. Users should consult the original research before relying on these summaries for academic, commercial, or policy decisions.

A screen width greater than 1000px is required for viewing our search and directory listing pages.

bottom of page