NVIDIA港大MIT联合推出Fast-dLLM v2：端到端吞吐量提升2.5倍

新浪财经

Oct 26, 2025

（来源：机器之心）自回归（AR）大语言模型逐 token 顺序解码的范式限制了推理效率；扩散 LLM（dLLM）以并行生成见长，但过去难以稳定跑赢自回归（AR）模型，尤其是在 KV Cache 复用、和可变长度支持上仍存挑战。Fast-dLLMv2给出了一条务实路线：将预训练 AR 模型适配为适配为能并行解码的 Block-dLLM—— 且只需～1B tokens 量级的微调即可达到 “无损”...

Source Link

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

{"basename":"","ssrTDKData":{"titleTemplate":"%s - Tiger Brokers","title":"Tiger Brokers | Global Stocks, Options & Futures Trading App","description":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","keywords":"tiger brokers,tiger trade,tiger brokers singapore,broker online,stock trading in singapore,share trading singapore,brokerage firm singapore,trading app,stock broker singapore,stock trading platforms,trading account","social":{"ogDescription":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","ogImage":"https://c1.itigergrowtha.com/portal5/static/media/og-logo.be62fbe1.png","ogUrl":"https://www.itiger.com/news/2578668081"},"companyName":"Tiger Brokers"},"pageData":{"isMobile":false,"isTiger":false,"isTTM":true,"region":"SGP","license":"TBSG","edition":"fundamental"},"isCrawlerRequest":true,"__swrFallback__":{"@#url:\"https://stock-news.skytigris.cn/v3/news\",params:#id:\"2578668081\",edition:\"fundamental\",auth_exemption:1,,,undefined,":{"share":"https://ttm.financial/m/news/2578668081?lang=en_US&edition=fundamental","thumbnail":"","is_english":false,"pubTime":"2025-10-26 12:00","share_image_url":"https://static.laohu8.com/9a95c1376e76363c1401fee7d3717173","id":"2578668081","market":"us","top_or_hot":-1,"title":"NVIDIA港大MIT联合推出Fast-dLLM v2：端到端吞吐量提升2.5倍","media":"新浪财经","content":"<div>\n<p>（来源：机器之心）自回归（AR）大语言模型逐 token 顺序解码的范式限制了推理效率；扩散 LLM（dLLM）以并行生成见长，但过去难以稳定跑赢自回归（AR）模型，尤其是在 KV Cache 复用、和 可变长度 支持上仍存挑战。Fast-dLLMv2给出了一条务实路线：将预训练 AR 模型适配为适配为能并行解码的 Block-dLLM—— 且只需～1B tokens 量级的微调即可达到 “无损”...</p>\n\n<a href=\"http://gu.qq.com/resources/shy/news/detail-v2/index.html#/?id=nesSN20251026125230a47529b6&s=b\">Source Link</a>\n\n</div>\n","source":"tencent","html":"<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no\"/>\n<meta name=\"format-detection\" content=\"telephone=no,email=no,address=no\" />\n<title>NVIDIA港大MIT联合推出Fast-dLLM v2：端到端吞吐量提升2.5倍</title>\n<style type=\"text/css\">\na,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,\nem,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,\nobject,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{ font:inherit;margin:0;padding:0;vertical-align:baseline;border:0 }\nbody{ font-size:16px; line-height:1.5; color:#999; background:transparent; }\n.wrapper{ overflow:hidden;word-break:break-all;padding:10px; }\nh1,h2{ font-weight:normal; line-height:1.35; margin-bottom:.6em; }\nh3,h4,h5,h6{ line-height:1.35; margin-bottom:1em; }\nh1{ font-size:24px; }\nh2{ font-size:20px; }\nh3{ font-size:18px; }\nh4{ font-size:16px; }\nh5{ font-size:14px; }\nh6{ font-size:12px; }\np,ul,ol,blockquote,dl,table{ margin:1.2em 0; }\nul,ol{ margin-left:2em; }\nul{ list-style:disc; }\nol{ list-style:decimal; }\nli,li p{ margin:10px 0;}\nimg{ max-width:100%;display:block;margin:0 auto 1em; }\nblockquote{ color:#B5B2B1; border-left:3px solid #aaa; padding:1em; }\nstrong,b{font-weight:bold;}\nem,i{font-style:italic;}\ntable{ width:100%;border-collapse:collapse;border-spacing:1px;margin:1em 0;font-size:.9em; }\nth,td{ padding:5px;text-align:left;border:1px solid #aaa; }\nth{ font-weight:bold;background:#5d5d5d; }\n.symbol-link{font-weight:bold;}\n/* header{ border-bottom:1px solid #494756; } */\n.title{ margin:0 0 8px;line-height:1.3;color:#ddd; }\n.meta {color:#5e5c6d;font-size:13px;margin:0 0 .5em; }\na{text-decoration:none; color:#2a4b87;}\n.meta .head { display: inline-block; overflow: hidden}\n.head .h-thumb { width: 30px; height: 30px; margin: 0; padding: 0; border-radius: 50%; float: left;}\n.head .h-content { margin: 0; padding: 0 0 0 9px; float: left;}\n.head .h-name {font-size: 13px; color: #eee; margin: 0;}\n.head .h-time {font-size: 11px; color: #7E829C; margin: 0;line-height: 11px;}\n.small {font-size: 12.5px; display: inline-block; transform: scale(0.9); -webkit-transform: scale(0.9); transform-origin: left; -webkit-transform-origin: left;}\n.smaller {font-size: 12.5px; display: inline-block; transform: scale(0.8); -webkit-transform: scale(0.8); transform-origin: left; -webkit-transform-origin: left;}\n.bt-text {font-size: 12px;margin: 1.5em 0 0 0}\n.bt-text p {margin: 0}\n</style>\n</head>\n<body>\n<div class=\"wrapper\">\n<header>\n<h2 class=\"title\">\nNVIDIA港大MIT联合推出Fast-dLLM v2：端到端吞吐量提升2.5倍\n</h2>\n\n<h4 class=\"meta\">\n\n\n2025-10-26 12:00 北京时间&nbsp;&nbsp;&nbsp;<a href=http://gu.qq.com/resources/shy/news/detail-v2/index.html#/?id=nesSN20251026125230a47529b6&s=b><strong>新浪财经</strong></a>\n\n\n</h4>\n\n</header>\n<article>\n<div>\n<p>（来源：机器之心）自回归（AR）大语言模型逐 token 顺序解码的范式限制了推理效率；扩散 LLM（dLLM）以并行生成见长，但过去难以稳定跑赢自回归（AR）模型，尤其是在 KV Cache 复用、和 可变长度 支持上仍存挑战。Fast-dLLMv2给出了一条务实路线：将预训练 AR 模型适配为适配为能并行解码的 Block-dLLM—— 且只需～1B tokens 量级的微调即可达到 “无损”...</p>\n\n<a href=\"http://gu.qq.com/resources/shy/news/detail-v2/index.html#/?id=nesSN20251026125230a47529b6&s=b\">Source Link</a>\n\n</div>\n\n\n</article>\n</div>\n</body>\n</html>\n","isBrief":false,"type":0,"news_type":1,"symbol":"NVDA","symbol_name":"英伟达","start_time":0,"source_url":"http://gu.qq.com/resources/shy/news/detail-v2/index.html#/?id=nesSN20251026125230a47529b6&s=b","article_id":"2578668081","we_media_id":null,"thumbnails":[],"rights":{"source":"tencent","url":"http://gu.qq.com/resources/shy/news/detail-v2/index.html#/?id=nesSN20251026125230a47529b6&s=b","rn_cache_url":null,"customStyle":"body{padding-top:10px;}#news_title{font-weight:bold;#titleStyle#;}#news_description span{font-size:12px;#descriptionStyle#;}.footer-note{#statement#}","selectors":".mod-LoadTzbdNews, body","filters":".relate-stock, .hot-list, .recom-box, .wx-sou","directOrigin":true},"url":"https://stock-news.laohu8.com/highlight/detail?id=2578668081","pubTimestamp":1761451200,"columns":[],"sourceInfo":{"source_id":"tencent","name":"腾讯"},"weMediaInfo":null,"summary":"在 A100/H100 上，它在保持精度的同时，将端到端吞吐显著拉高，最高可达 2.5×。7B 规模吞吐与精度：在 A100 上，Fast-dLLM v2吞吐为 Qwen2.5-7B-Instruct 的 2.54×；同时对比 Fast-dLLM-LLaDA 还有 +5.2% 的准确率提升。总结Fast-dLLM v2 提供了一条务实路线：用很少的数据把 AR 模型适配为 Block Diffusion LLM，相较等规模 AR 的端到端吞吐量约提升 2.5×，精度保持可比，并且关键开关都能工程化地按目标调优，这是一个成本与收益比较均衡的解法。","collect":0,"end_time":0,"defaultTopTitle":"qq.com","property":[],"viewcount":null,"language":"zh","relate_stocks":{"NVDA":"英伟达","IE00B5949003.HKD":"JANUS HENDERSON GLOBAL TECHNOLOGY AND INNOVATION \"A\" (HKD) ACC","NVIW.SI":"NVDA 3xLongSG261006","SGXZ31699556.SGD":"UGDP UNITED GLOBAL QUALITY GROWTH \"C\" (SGDHDG) ACC","2NVD.UK":"2X NVIDIA ETP","LU2360107168.USD":"BGF NEXT GENERATION TECHNOLOGY \"A4\" (USD) INC","LU2077746001.SGD":"Blackrock ESG Multi-Asset A2 SGD-H","SNVD.UK":"LS -1X NVIDIA","LU1868837300.USD":"CT (LUX) I AMERICAN FUND \"9\" (USD) ACC","LU2931357623.SGD":"MANULIFE GF GLOBAL SEMICONDUCTOR OPPORTUNITIES \"AA\" (SGDHDG) ACC","LU0683600562.USD":"AB SELECT US EQUITY \"A\" (USD) ACC","LU1814569148.SGD":"WELLINGTON GLOBAL QUALITY GROWTH \"D\" (SGDHDG) ACC","LU2746668974.SGD":"MANULIFE DYNAMIC LEADERS \"AA\" (SGDHDG) ACC","LU1815336760.USD":"THREADNEEDLE (LUX) GLOBAL TECHNOLOGY \"AUP\" (USD) INC","BK4141":"半导体产品","NVD2.UK":"2X NVIDIA ETP","IE00BN29S564.USD":"JANUS HENDERSON BALANCED \"A3\" (USD) INC","LU0672654240.SGD":"FTIF - Franklin US Opportunities A Acc SGD-H1","LU2420271590.USD":"ALLIANZ SELECT INCOME AND GROWTH \"AT\" (USD) ACC","NVDS":"1.5倍做空NVDA ETF-Tradr","LU0316494557.USD":"FRANKLIN GLOBAL FUNDAMENTAL STRATEGIES \"A\" ACC","NVD3.UK":"LS 3X NVIDIA","3NVD.UK":"LS 3X NVIDIA","IE00BFXG0V08.USD":"BNY MELLON GLOBAL LEADERS \"B\" (USD) ACC","LU0347712357.USD":"BNP PARIBAS GLOBAL ENVIRONMENT \"C\" (USD) ACC","NVDU":"2倍做多NVDA ETF-Direxion","LU0942090050.USD":"UBS (LUX) EQUITY SICAV - US TOTAL YIELD SUSTAINABLE \"P\" (USD)  INC","LU1066051811.HKD":"HSBC GIF GLOBAL EQUITY VOLATILITY FOCUSED \"AM2\" (HKD) INC","LU0069063385.USD":"SUSTAINABLE GLOBAL THEMATIC PORTFOLIO \"A\" (USD) ACC","NVDS.UK":"LS -1X NVIDIA","NVDX":"2倍做多NVDA ETF-T-Rex","LU1623119135.USD":"Natixis Mirova Global Sustainable Equity R-NPF/A USD","LU0061474705.USD":"THREADNEEDLE (LUX) GLOBAL DYNAMIC REAL RETURN \"AU\" (USD) ACC","SG9999014880.SGD":"大华全球优质成长基金Acc SGD","LU0466842654.USD":"HSBC ISLAMIC GLOBAL EQUITY INDEX \"A\" (USD) ACC","LU0965509101.SGD":"AB LOW VOLATILITY EQUITY PORTFOLIO \"A\" (SGDHDG) ACC","NVD":"2倍做空NVDA ETF-GraniteShares","NVDY":"NVDA期权收益策略ETF-YieldMax","IE00B5TLWC47.USD":"BNY MELLON LONG-TERM GLOBAL EQUITY \"B\" (USD) ACC","BK4588":"碎股","SGXZ51526630.SGD":"大华环球创新基金A Acc SGD","IE00BQXX3C00.GBP":"GUINNESS GLOBAL INNOVATORS \"C\" (GBP) ACC","LU1852331112.SGD":"Blackrock World Technology Fund A2 SGD-H","MIT":"Mason Industrial Technology Inc","FAST":"快扣","LU2417539215.USD":"ALLIANZ GLOBAL INCOME \"AMF\" (USD) INC","BK4608":"AI应用概念股","NVDD":"1倍做空NVDA ETF-Direxion","LU0823434583.USD":"BNP PARIBAS US GROWTH \"C\" (USD) ACC"},"translate_title":"NVIDIA HKU and MIT jointly launch Fast-dLLM v2: end-to-end throughput increased by 2.5 times","themeId":null,"isJumpTheme":false,"ttsUrl":null,"symbols_score_info":{"NVDS.UK":0.6,"NVIW.SI":0.6,"MIT":1.5,"NVD2.UK":0.6,"NVDX":0.6,"NVDD":0.6,"NVD3.UK":0.6,"NVDY":0.6,"3NVD.UK":0.6,"FAST":1.5,"SNVD.UK":0.6,"NVD":0.6,"NVDS":0.6,"NVDA":1.5,"NVDU":0.6,"2NVD.UK":0.6},"content_text":"（来源：机器之心）自回归（AR）大语言模型逐 token 顺序解码的范式限制了推理效率；扩散 LLM（dLLM）以并行生成见长，但过去难以稳定跑赢自回归（AR）模型，尤其是在 KV Cache 复用、和 可变长度 支持上仍存挑战。Fast-dLLMv2给出了一条务实路线：将预训练 AR 模型适配为适配为能并行解码的 Block-dLLM—— 且只需～1B tokens 量级的微调即可达到 “无损” 迁移，不必训练数百 B tokens（如 Dream 需～580B tokens）。在 A100/H100 上，它在保持精度的同时，将端到端吞吐显著拉高，最高可达 2.5×。作者单位：HKU、NVIDIA、MIT。论文链接：https://arxiv.org/pdf/2509.26328项目网站链接：https://nvlabs.github.io/Fast-dLLM/v2/代码链接：https://github.com/NVlabs/Fast-dLLM核心看点少量数据适配（~1B tokens）：已有的 AR 模型（如 Qwen2.5-Instruct 1.5B/7B）用约 1B tokens 的微调就能适配成 Block Diffusion LLM，不必训练数百 B tokens（如 Dream 需～580B tokens）。架构上 “AR 友好”： 设计上 块内双向、块间因果；配合互补掩码与 token-shift，让模型既保留 AR 的语义组织与可变长度能力，又获得块内并行带来的效率增益。迁移过程更自然、数据效率高。层级缓存 + 并行解码：块级 KV Cache + 子块 DualCache，配合置信度阈值的并行解码，端到端最高 2.5× 提速。大模型验证：在 7B 规模上保持与 AR 相当的生成质量下，吞吐对比 Qwen2.5-7B-Instruct 提升 2.54×。原理与做法：从 AR 到 Block Diffusion1）块式扩散与 AR - 友好注意力Fast-dLLM v2 按固定块大小把序列切成若干块：块内双向注意力以并行去噪，块间保持左到右的因果关系，从而既能并行、又能沿用 AR 的语义组织、可变长度和 KV Cache；配合互补掩码（complementary masking）与 token-shift，保证每个 token 都在 “可见 / 被遮” 两种视角下学习，稳定恢复 AR 语义表征。2）层级缓存（Hierarchical Cache）块级缓存：已解码块的 KV 直接复用，天然支持 KV Cache。子块缓存（DualCache）：在部分解码的当前块内部，同时缓存前缀与后缀的 KV 激活，减少迭代去噪揭示 / 复原时的重复计算，贴合并行细化流程。3）置信度感知的并行解码延续 v1 的思路：当某位置的预测置信度超过阈值（如 0.9），即可并行确定多个 token，其余不确定位置保留待后续细化。在 GSM8K 上，阈值 0.9 时吞吐从 39.1→101.7 tokens/s，提速约 2.6×，精度影响可忽略。性能结果端到端加速：综合实验显示，对标准 AR 解码最高 2.5× 提速，同时维持生成质量。7B 规模吞吐与精度：在 A100 上，Fast-dLLM v2（7B）吞吐为 Qwen2.5-7B-Instruct 的 2.54×；同时对比 Fast-dLLM-LLaDA 还有 +5.2% 的准确率提升（GSM8K）。Batch / 硬件可扩展性：在 A100/H100 上随 batch 增大，扩散解码的并行优势更明显；A100 上可达～1.5× 吞吐加速，H100 上最高可达～1.8× 加速。Benchmark 综合得分：1.5B：平均分45.0，超过 Qwen2.5-1.5B 与 Qwen2.5-1.5B-Nemo-FT（使用相同的 LLaMA-Nemotron 后训练数据集上对 Qwen 做的标准 NTP 微调 baseline）；在同量级（≈1B 规模）的扩散类与 NTP 训练的 AR 类模型里，属于新的 SOTA。7B：平均分60.3，超过 Qwen2.5-7B-Nemo-FT（59.6） 和 Dream（57.6）；多数单项基准上持平或更好。评测覆盖 HumanEval/MBPP、GSM8K/MATH、MMLU/GPQA、IFEval 等多项基准。训练成本数据 / 算力成本：以～1B tokens 量级微调把 AR 模型适配为 Block Diffusion LLM（对比 Dream 的～500B tokens），门槛显著降低；论文给出了 Qwen2.5-Instruct 1.5B/7B 在 64×A100 上的具体训练步数与配置，只需要几个小时即可完成训练，可复现性强。总结Fast-dLLM v2 提供了一条务实路线：用很少的数据（~1B tokens）把 AR 模型适配为 Block Diffusion LLM，相较等规模 AR 的端到端吞吐量约提升 2.5×，精度保持可比，并且关键开关（块大小、阈值、缓存）都能工程化地按目标调优，这是一个成本与收益比较均衡的解法。","kind":"news","is_publish_news":true,"is_publish_highlight":false,"is_publish_live":false,"is_publish_wemedia":null,"editions":null,"column":"","sentiment":"1","news_tag":"productRelease","news_rank":0,"symbols":[],"gpt_button":0,"need_auth":false,"code":"91000000","status":"200"}}}