Ant Group Releases Open-Source Multimodal AI Model Ming-Flash-Omni 2.0

Stock News
02/11

Ant Group has officially open-sourced its latest multimodal large model, Ming-Flash-Omni 2.0. The model demonstrates exceptional performance across multiple public benchmarks, with particularly strong capabilities in visual language understanding, controllable speech generation, and image generation and editing. Some metrics have surpassed those of Gemini 2.5 Pro. Ming-Flash-Omni 2.0 is also the industry's first model capable of unified audio generation across all scenarios, enabling the synchronized synthesis of speech, ambient sound effects, and background music within a single audio track. Users can finely adjust parameters such as timbre, speech rate, tone, volume, emotion, and even dialect through natural language instructions. In terms of inference efficiency, the model achieves an extremely low inference frame rate of 3.1Hz, allowing real-time generation of high-fidelity, minute-long audio while significantly optimizing computational costs and response speed. Ant Group has been investing in multimodal research for several years, with the Ming-Omni series now in its third iteration. The open-sourcing of Ming-Flash-Omni 2.0 releases its core capabilities as a reusable foundation, providing a unified entry point for end-to-end multimodal application development. Users can also experience and access the model online through Ant's official platform, Ling Studio.

免責聲明:投資有風險,本文並非投資建議,以上內容不應被視為任何金融產品的購買或出售要約、建議或邀請,作者或其他用戶的任何相關討論、評論或帖子也不應被視為此類內容。本文僅供一般參考,不考慮您的個人投資目標、財務狀況或需求。TTM對信息的準確性和完整性不承擔任何責任或保證,投資者應自行研究並在投資前尋求專業建議。

熱議股票

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10