MiniMax Unveils M2.5 Model: Priced at $1 per Hour, Costing Just 1/20th of GPT-5, with Performance Rivaling Claude Opus

Deep News
Feb 13

MiniMax has launched its latest iteration, the M2.5 series model, which significantly reduces inference costs while maintaining industry-leading performance. The release aims to address the economic infeasibility of complex Agent applications and claims the model has reached or set new State-of-The-Art (SOTA) levels in programming, tool usage, and office scenarios. Data released by MiniMax on February 13 shows that M2.5 offers a substantial price advantage. In a version outputting 50 tokens per second, its price is only 1/10th to 1/20th that of leading models like Claude Opus, Gemini 3 Pro, and GPT-5.

In a high-speed operating environment of 100 tokens per second, the cost for M2.5 to run continuously for one hour is just $1. If the speed is reduced to 50 tokens per second, the cost drops further to $0.3. This means a budget of $10,000 can support four Agents running continuously for a full year, drastically lowering the barrier to building and operating large-scale Agent clusters.

In terms of performance, M2.5 demonstrates strong capabilities in core programming tests and achieved first place in the multi-language Multi-SWE-Bench task, with overall performance comparable to the Claude Opus series. The model also features optimized decomposition abilities for complex tasks. In the SWE-Bench Verified test, the speed of task completion improved by 37% compared to the previous generation M2.1, with end-to-end runtime reduced to 22.8 minutes, on par with Claude Opus 4.6. MiniMax's internal operations have already validated the model's capabilities. Data indicates that 30% of the company's overall tasks are now autonomously handled by M2.5, covering core functions such as R&D, product development, and sales. Particularly in programming scenarios, code generated by M2.5 accounts for 80% of newly submitted code, demonstrating high penetration and usability in real-world production environments. Shattering Cost Barriers: The Economic Feasibility of Unlimited Agent Operation The design philosophy behind M2.5 is to eliminate the cost constraints of running complex Agents. MiniMax achieved this by optimizing inference speed and token efficiency. The model provides an inference speed of 100 TPS (Transactions Per Second), approximately double that of current mainstream models. Beyond simply reducing computational costs, M2.5 decreases the total number of tokens required to complete a task through more efficient task decomposition and decision logic. In the SWE-Bench Verified evaluation, M2.5 consumed an average of 3.52 million tokens per task, lower than the 3.72 million tokens consumed by M2.1. The dual improvements in speed and efficiency make it economically almost feasible for enterprises to build and operate Agents without limits, shifting the competitive focus from cost to the speed of model capability iteration. Advancements in Programming Capability: Thinking and Building Like an Architect In the programming domain, M2.5 focuses not only on code generation but also emphasizes system design capabilities. The model has evolved a native "Spec" (Specification) behavior, allowing it to proactively decompose functions, structures, and UI design from an architect's perspective before coding begins. The model was trained on over 10 programming languages (including GO, C++, Rust, Python, etc.) and hundreds of thousands of real-world environments. Tests show that M2.5 is capable of handling the entire development lifecycle, from system design (0-1), development (1-10), and feature iteration (10-90) to final code review (90-100). To verify its generalization across different development environments, MiniMax tested the model on programming scaffolds like Droid and OpenCode. The results showed that M2.5 achieved pass rates of 79.7 on Droid and 76.1 on OpenCode, outperforming both the previous generation model and Claude Opus 4.6.

Complex Task Handling: More Efficient Search and Professional Delivery In search and tool usage, M2.5 demonstrates higher decision-making maturity, moving beyond merely "getting it right" to seeking more streamlined paths to solve problems. In various tasks such as BrowseComp, Wide Search, and RISE, M2.5 reduced the number of interaction rounds by approximately 20% compared to its predecessor, achieving results with superior token efficiency.

For office scenarios, MiniMax collaborated with seasoned professionals from fields like finance and law to incorporate industry tacit knowledge into the model's training. In the internally developed Cowork Agent evaluation framework (GDPval-MM), M2.5 achieved an average win rate of 59.0% in head-to-head comparisons with mainstream models. It can produce industry-standard Word research reports, PPT presentations, and complex Excel financial models, rather than simple text generation.

Technical Foundation: Native Agent RL Framework Drives Linear Improvement The core driver of M2.5's performance enhancement is large-scale reinforcement learning (RL). MiniMax employed a native Agent RL framework named Forge, which decouples the underlying training/inference engine from the Agent by introducing a middleware layer, supporting the integration of any scaffold. On the algorithmic front, MiniMax continued using the CISPO algorithm to ensure the stability of the Mixture-of-Experts (MoE) model during large-scale training. To address the credit assignment challenges posed by the Agent's long context, a Process Reward mechanism was introduced. Furthermore, the engineering team optimized asynchronous scheduling strategies and tree-based training sample merging strategies, achieving approximately a 40x acceleration in training. This validates the trend of near-linear improvement in model capabilities as computational power and task volume increase.

Currently, M2.5 is fully deployed in MiniMax Agent, API, and Coding Plan services. The model weights will also be open-sourced on HuggingFace, supporting local deployment.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10