Orient Securities released a research report stating that opportunities in vertical-specific multimodal AI applications should be valued. The firm is optimistic that technological breakthroughs and cost optimization will accelerate industry trends, driving user growth, increased payment penetration, and a new stage of commercialization. Furthermore, companies with overseas deployment strategies for multimodal AI applications are particularly noteworthy, as their growth rates may be faster. The main views of Orient Securities are as follows:
Since the beginning of the year, domestic models in the multimodal video generation sector have accelerated iteration, promoting overall technological advancement in the industry and significantly narrowing the gap with overseas counterparts. The most significant marginal change lies in intelligent storyboarding lowering the barrier to entry for users, and unified multimodal architectures supporting more efficient, flexible, and controllable expression of creative intent. The firm judges that significant progress will be made in both business-to-business (B2B) and business-to-consumer (B2C) expansion in 2026. While model developers compete on technology, the focus should be on observing AI penetration in high-growth content sectors.
The accelerated iteration pace in video generation is driving technological leaps in the industry, with the technological gap between domestic and international players continuing to narrow. Overall this year, domestic video generation developers have further accelerated their model development cycles. As the latest models from various companies are released, the technological ceiling on the domestic supply side has been raised. Fundamental attributes such as the合理性 of physical laws, motion fluency, and instruction-following capability have all significantly improved. Capabilities like storyboarding and synchronized audio-video generation, which were previously missing, have been filled with better and more controllable results. The differentiation from overseas models lies in supporting reference generation from multimodal inputs like images, audio, and video, as well as secondary video editing capabilities. Overall, the video generation sector has entered a competitive state similar to that of Large Language Models in 2025. With the basic capabilities of various players having reached a relatively high standard, the firm judges that subsequent differentiation will likely lie in specific application scenarios.
Video generation is entering an era of precise control and a "dashboard" approach, with lowered barriers driving user base expansion on both the B2B and B2C ends. Summarizing recent marginal evolution in video generation: (1) Transition from random generation to precise control: Recent iterations of the latest models are mostly architectures supporting multimodal input, allowing uploads of images, videos, or audio for reference generation. Compared to last year's random generation, controllability is stronger, leading to a significant increase in the success rate of outputs. (2) More user-friendly duration and lower creation barriers: Single generation duration has increased to around 15 seconds, further lowering the creation barrier for both B2B and B2C users. Domestic models have basically filled the gap in multi-shot narrative functionality. This means that even novice consumer users with good ideas can create with the help of the tools. For professional B2B creators, the model's autonomous design of each storyboard shot also reduces the requirement for strong storyboarding skills. (3) Editability: Support for refined add, delete, and modify operations on generated content enables quick secondary adjustments.
The firm believes that technological iteration in 2026 will focus more on integrating into production workflows, helping creators express intent efficiently and achieve controllable creation.
Investment Recommendations: Related targets include Alphabet Inc. (GOOGL.US), Kuaishou Technology (01024), MINIMAX (00100), and Meitu (01357).
Risk warnings: AI technology iteration falling short of expectations, slower-than-expected adoption of AI applications, and AI commercialization progress falling short of expectations.