It is reported that Guotai Haitong Securities released a research note stating that Shanghai-based AI unicorn MiniMax recently launched a full-modal "suite" covering text, video, voice, and music. Its text-based large model M2 topped global open-source models in authoritative rankings, marking a comprehensive breakthrough for Chinese AI firms in full-modal technology and opening new opportunities for commercialization. Key insights from Guotai Haitong are as follows:
Investment Recommendation: MiniMax recently unveiled its full-modal "suite," establishing a comprehensive technical system spanning text, vision, speech, and music. Its text-based large model M2 has entered the top tier in global authoritative evaluations, achieving a breakthrough in the "performance, speed, cost" impossible triangle with exceptional cost efficiency. This signifies a critical leap for China's AI technology from following to leading globally.
Recent Developments: MiniMax released its full-modal "suite," with its text-based model M2 ranking first among global open-source large models. The company introduced four major models: the text-based M2, video-generation model Hailuo 2.3, speech model Speech 2.6, and music model Music 2.0. Among them, MiniMax-M2—an open-source text model optimized for intelligent agents and coding—achieved top-five global and first-place open-source rankings in the Artificial Analysis (AA) benchmark with a lightweight architecture (10B active parameters out of 230B total). It is the first Chinese open-source model to join the global top tier.
M2 sets a new benchmark in model efficiency and cost control, experiencing a surge in usage post-launch. Its comprehensive inference cost is as low as $0.53 per million tokens, just 8% of Claude 4.5 Sonnet’s, while its inference speed is nearly twice as fast. This breakthrough balance in the "impossible triangle" of performance, speed, and cost provides a solid technical foundation for large-scale commercial applications.
With its extreme cost-performance ratio ($0.53/million tokens), MiniMax-M2 rose to fourth globally and first domestically in API calls on OpenRouter within five days of launch, ranking third globally in programming-related calls. This market response validates its superior balance between high performance and low cost, offering a successful case for domestic models in global commercialization.
Full-Modal Product Matrix: The "suite" reflects a complete technical layout prioritizing generation quality and stability. Key highlights include: - **Hailuo 2.3**: A video-generation model supporting native 1080p HD videos up to 10 seconds long, with 2.5x improved training/inference efficiency via noise-aware computational redistribution. - **Speech 2.6**: A speech model optimized for voice-agent scenarios, reducing first-packet response time to 250ms—leading the field. - **Music 2.0**: Capable of generating structurally complete songs up to 5 minutes long.
Notably, while the industry widely adopts simplified attention mechanisms, MiniMax invests in full attention mechanisms despite higher costs to ensure quality and stability in long-context and complex reasoning scenarios. This choice underscores the company’s long-term commitment to foundational algorithm research and pursuit of technical excellence.
Risk Factors: Potential risks include slower-than-expected model iteration, insufficient computing power supply, and data privacy compliance challenges.