蚂蚁集团开源全模态大模型Ming-Flash-Omni 2.0

智通财经
Feb 11

智通财经APP获悉,2月11日,蚂蚁集团正式对外开源其最新一代全模态大模型Ming-Flash-Omni 2.0。该模型在多项公开基准测试中展现出卓越性能,尤其在视觉语言理解、语音可控生成以及图像生成与编辑等核心能力上表现突出,部分指标已超越Gemini 2.5 Pro。

Ming-Flash-Omni 2.0同时也是业内首个支持全场景音频统一生成的模型,能够在单一音轨内同步合成语音、环境音效与背景音乐。用户仅需通过自然语言指令,即可对音色、语速、语调、音量、情绪乃至方言等参数实施精细化调控。

在推理效率方面,该模型实现了3.1Hz的极低推理帧率,能够实时生成高保真度的分钟级长音频,在兼顾生成质量的同时显著优化了计算成本与响应速度。

蚂蚁集团在全模态方向已持续投入多年,Ming-Omni系列迭代三个版本,此次将Ming-Flash-Omni2.0开源,意味着其核心能力以“可复用底座”的形式对外释放,为端到端多模态应用开发提供统一能力入口。用户也可通过蚂蚁百灵官方平台Ling Studio在线体验与调用。

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10