LongCat-Flash-Thinking Officially Released: Stronger, More Professional, Maintaining Ultra-High Speed!

Deep News
Sep 22, 2025

Today, Meituan's LongCat team officially released a new high-efficiency inference model, LongCat-Flash-Thinking. While maintaining the extreme speed of LongCat-Flash-Chat, the newly released LongCat-Flash-Thinking is more powerful and professional. Comprehensive evaluations show that LongCat-Flash-Thinking has achieved state-of-the-art (SOTA) performance among global open-source models in reasoning tasks across multiple domains including logic, mathematics, code, and agents.

LongCat-Flash-Thinking not only enhances the ability for autonomous tool calling by agents but also expands formal theorem proving capabilities, becoming the first domestic large language model to combine "deep thinking + tool calling" with "informal + formal" reasoning capabilities. We found that LongCat-Flash-Thinking demonstrates particularly significant advantages in handling ultra-high complexity tasks such as mathematics, code, and agent tasks.

Currently, this model has been fully open-sourced on HuggingFace and Github:

Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking

Github: https://github.com/meituan-longcat/LongCat-Flash-Thinking

**Domain-Parallel RL Training Method**

To address the stability issues in mixed training across reinforcement learning domains, we designed a domain-parallel approach that decouples the optimization processes for STEM, code, and agent tasks. This method adopts an advanced strategy of multi-domain parallel training followed by fusion, achieving balanced improvement in model capabilities with comprehensive performance reaching Pareto-Optimal.

**Dynamic Orchestration for Asynchronous Rollout (DORA)**

Our Dynamic Orchestration for Asynchronous Rollout (DORA) system serves as the foundation of the entire training process. Through Elastic Colocation and Multi-Version Asynchronous Pipeline design, this system achieves a three-fold speedup compared to synchronous RL training frameworks while ensuring policy consistency for each sample. Additionally, the system implements efficient KV cache reuse, supporting stable operation of ten-thousand-card scale clusters.

**Agentic Reasoning Framework**

To further enhance the model's agentic reasoning capabilities, we proposed an innovative "dual-path reasoning framework." This framework can autonomously filter optimal query samples and combine agentic reasoning with tool usage through automated processes, enabling the model to intelligently identify and invoke external tools (such as code executors, APIs, etc.) to efficiently solve complex tasks. Based on AIME25 test data, LongCat-Flash-Thinking demonstrates more efficient Agentic Tool Use capabilities under this framework, saving 64.5% of tokens (from 19,653 to 6,965) while maintaining 90% accuracy compared to not using tool calling, significantly optimizing resource utilization in the reasoning process.

**Formal Reasoning Framework**

To overcome the deficiencies of current open-source general large language models in formal proof tasks, we designed a novel data synthesis method based on expert iteration framework specifically for formal reasoning. This process utilizes an expert iteration framework integrated with Lean4 server to generate rigorously verified proof processes, systematically enhancing the model's formal reasoning capabilities. This innovative method systematically strengthens the model's formal reasoning abilities, improving its reliability in academic and engineering applications.

LongCat-Flash-Thinking has set new records in multiple authoritative evaluations, demonstrating consistently leading performance across various reasoning tasks:

**General Reasoning Capability**: LongCat-Flash-Thinking possesses exceptional general reasoning capabilities, particularly excelling in tasks requiring structured logic. It achieved 50.3 points on the ARC-AGI benchmark, surpassing top closed-source models such as OpenAI o3 and Gemini2.5 Pro.

**Mathematical Capability**: LongCat-Flash-Thinking demonstrates powerful capabilities in mathematical reasoning, ranking among current top-tier models. Its advantages are more pronounced in more challenging benchmarks—achieving breakthrough results on HMMT and AIME-related benchmarks, surpassing OpenAI o3 and matching the level of leading models like Qwen3-235B-A22B-Thinking. These results confirm its leading ability to solve complex, multi-step problems.

**Coding Capability**: In the programming domain, LongCat-Flash-Thinking demonstrates state-of-the-art (SOTA) performance and comprehensive strength among open-source models. With a score of 79.4 on LiveCodeBench, it significantly outperforms participating open-source models and matches the performance of top-tier closed-source model GPT-5, proving its exceptional ability to solve high-difficulty programming competition problems. It also maintains strong competitiveness with a score of 40.7 on the OJBench benchmark, approaching the level of leading model Gemini2.5-Pro.

**Agentic Capability**: LongCat-Flash-Thinking excels in complex Tool-augmented Reasoning, demonstrating strong capabilities in Agentic Tool Use. It achieved a new open-source SOTA score of 74.0 on τ2-Bench and showed exceptional competitiveness across benchmarks including SWE-Bench, BFCL V3, and VitaBench.

**ATP Formal Reasoning Capability**: LongCat-Flash-Thinking achieved a pass@1 score of 67.6 on the MiniF2F-test benchmark, significantly leading all other evaluated models, while maintaining leading advantages in pass@8 and pass@32, highlighting its absolute superiority in generating structured proofs and formal mathematical reasoning.

Visit https://longcat.ai/ to immediately experience LongCat-Flash-Thinking's deep thinking functionality.

Open-source platform addresses:

Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking

Github: https://github.com/meituan-longcat/LongCat-Flash-Thinking

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10