Ant Group Co., Ltd. has officially open-sourced dInfer, the industry's first high-performance diffusion language model inference framework.
According to the introduction, in benchmark tests, dInfer improved the inference speed of diffusion language models by 10.7 times compared to NVIDIA's diffusion model framework Fast-dLLM. On the code generation task HumanEval, dInfer achieved a speed of 1,011 tokens per second in single-batch inference, becoming the first in the open-source community to achieve diffusion language model single-batch inference speeds that significantly surpass autoregressive models. dInfer's work demonstrates that diffusion language models possess significant efficiency potential that can be realized through systematic innovative engineering, providing a highly competitive option for architectural pathways toward AGI.
On nodes equipped with 8 NVIDIA H800 GPUs, dInfer's performance is remarkable:
In comparison with the previous dLLM inference solution Fast-dLLM, dInfer achieved a massive 10.7-fold improvement in average inference speed (avg TPS) while maintaining equivalent model performance (681 vs 63.6). On the code generation task HumanEval, dInfer achieved a speed of 1,011 tokens per second in single-batch inference. Compared to the AR model Qwen2.5-3B with comparable parameters and performance running on vLLM, the industry's top inference service framework, dInfer's average inference speed is 2.5 times faster (681 vs 277).
Ant Group Co., Ltd. stated that dInfer bridges cutting-edge research with industrial implementation, marking a crucial step for diffusion language models from "theoretically feasible" to "practically efficient." This open-source release also serves as an invitation to developers and researchers worldwide to jointly explore the enormous potential of diffusion language models and build a more efficient and open AI ecosystem.