Meituan Unveils Native Multimodal Large Model LongCat-Next

Deep News
Yesterday

Meituan has announced the release and full open-sourcing of its native multimodal large model, LongCat-Next, along with its core component, the Discrete Native Resolution Visual Tokenizer (dNaViT). This model breaks away from the traditional "language-centric" patched architecture common in current large models by uniformly mapping images, audio, and text into homologous discrete tokens. Utilizing a pure "Next Token Prediction" (NTP) paradigm, LongCat-Next enables vision and speech to become AI's "native language."

According to the announcement, LongCat-Next achieves three key technological breakthroughs: first, the Discrete Native Autoregressive (DiNA) architecture completely eliminates modality barriers; second, the Discrete Native Resolution Visual Tokenizer (dNaViT) constructs a "dictionary" for the visual world; and third, a semantically aligned complete encoder resolves the industry-wide challenge that "discretization inevitably leads to information loss."

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10