Xiaohongshu Deploys New Image Editing Model: Technological Breakthroughs and Ecosystem Ambitions Behind Major Update

On the evening of March 8, Xiaohongshu's Super Intelligence team quietly dropped a significant technological announcement. Arriving less than a month after the release of version 1.0, FireRed-Image-Edit version 1.1 was launched as scheduled. The company described this upgrade as "epic," a characterization that seems both surprising and fitting for a platform traditionally known for its community and product discovery features.

The surprise stems from the public perception of Xiaohongshu primarily as a lifestyle platform. However, it is fitting because, as the global competition in large language models enters a phase focused on deep application, a super-community with 300 million monthly active users must secure a defining role in shaping the next generation of content creation tools. The release of FireRed-1.1 is not merely an iteration of technical parameters but a declaration of its vision for what image editing should look like in the AI era.

To grasp the significance of FireRed-1.1, one must first understand two persistent challenges in the field of image editing: identity consistency and complex semantic integration. Past AI image editing often produced absurd results. For instance, a user might input a prompt like "make this person wear a red dress and stand by the seaside," only to generate an image where the person's facial features are distorted or the red dress appears awkwardly superimposed onto the beach background. This stems from the model's fragmented understanding of human subjects and its failure to comprehend spatial relationships. FireRed-1.1's breakthroughs directly target these critical weaknesses.

In portrait editing, the new version significantly improves the consistency of a subject's identity. This means that whether changing the clothing, altering the hairstyle, or adding complex makeup effects to a model in a photo, the model can tightly maintain the subject's key characteristics—the curve of the cheekbones, the angle of light in the eyes, even subtle lines around the mouth—throughout complex editing processes. Official data indicates that when processing complex instructions involving portraits, FireRed-1.1 can ensure the subject's features remain stable even under pixel-level alterations. For content creators, this addresses a critical pain point: where previous AI editing felt like swapping heads, FireRed offers precise refinement.

More impressively is its multi-threaded processing capability. The new version enhances the ability to fuse multiple elements, allowing it to combine over ten visual elements within a single image and complete the composition through automatic cropping and stitching mechanisms. Consider a complex prompt like: "A woman wearing a French retro-style shirt, sitting at a café by the Seine, with a cup of latte and an open copy of 'The Little Prince' on the table, the silhouette of the Eiffel Tower and falling plane leaves in the background." This instruction involves a person, clothing, a scene, objects, architecture, and natural phenomena. Traditional diffusion models often fail at some point—perhaps drawing the tower crookedly or blurring the leaves onto the face. The Agent module introduced in FireRed-1.1 is designed for this. When the input involves more than three reference images or contains complex elements, the system automatically performs region detection, image cropping, and stitching, then rewrites the editing instructions based on the new image structure. It moves beyond simple "puzzle-piecing" to a semantic-aware reconstruction.

Furthermore, FireRed-1.1 includes specialized optimizations for the two core content formats on Xiaohongshu: portrait photography and text layout. For portrait retouching and beauty, the model adds professional beauty editing, skin brightening, and creative makeup effects. This is not merely applying filters but involves "light and shadow reshaping" based on an understanding of facial structure. Simultaneously, the understanding of text styles has been strengthened, ensuring higher consistency in typography and font styles within generated images. For users creating cover images or posters, this significantly reduces the awkwardness of poorly integrated text and graphics.

If algorithmic capability determines a model's potential, then engineering prowess determines its feasibility for large-scale use. In evaluations, FireRed-Image-Edit achieved high scores on several image editing benchmarks, including ImgEdit, GEdit, and REDEdit. The team reported receiving high marks in human evaluations for prompt understanding and visual consistency. However, the number that truly caught the industry's attention is 4.5 seconds. FireRed-1.1 reduces end-to-end inference time to approximately 4.5 seconds and cuts VRAM requirements to around 30GB. This means it is no longer a scientific tool requiring expensive cloud GPUs but an industrial-grade instrument that can run smoothly on consumer-grade graphics cards and potentially even be deployed on edge devices.

Impressive technology alone cannot obscure the reality that this field is crowded with competitors. In image generation and editing, products like ByteDance's Doubao and Alibaba Cloud's Qianwen, along with numerous startups, have already established their presence. Many of the features highlighted are also core capabilities promoted by these rival models. So where does FireRed's competitive edge lie? The answer likely lies in the data flywheel and scenario闭环.

For a long time, users on Xiaohongshu primarily relied on external tools like Doubao for AI-generated or edited image content. This created an awkward situation: Xiaohongshu served as the source of inspiration and the platform for content distribution, but the core creation process happened elsewhere. Users would see inspiring content on Xiaohongshu, switch to another app to generate similar content, and then return to Xiaohongshu to publish it. FireRed's primary mission is to defend this territory. When the platform's built-in editing capabilities match or surpass external tools, users have no need to switch apps. The entire journey—from searching for tutorials, to generating content, to publishing—can be completed within Xiaohongshu's ecosystem. This not only enhances user experience fluidity but also allows the platform to accumulate vast amounts of creation behavior data within its own system, which can then be used to refine recommendation algorithms and model training.

A deeper competitive advantage lies in aesthetic alignment. Doubao and Qianwen are general-purpose models, prioritizing broad applicability and the range of instructions they can follow. FireRed, however, has grown from the soil of Xiaohongshu and inherently carries the community's aesthetic DNA. Xiaohongshu's content ecosystem has its own distinct visual language: a kind of "refined authenticity"—requiring通透 (translucent) lighting, soft color tones, compositions with a sense of space, and details that feel lived-in. FireRed's optimizations in multi-element fusion, portrait beauty, and font styling are clearly aimed at satisfying this specific Xiaohongshu aesthetic. While general models are still trying to learn what is considered visually appealing, FireRed is already learning what Xiaohongshu's community deems attractive. This community-tailored aesthetic alignment forms a moat that is difficult for any external general-purpose model to replicate.

Furthermore, the decision to open-source the model is a forward-thinking strategic move. As global competition in large models focuses on practical application, leading platforms are attempting to build differentiated AI competitiveness around content creation by lowering the barriers to multimodal technology. By going open-source, FireRed has the potential to attract numerous developers and small-to-medium enterprises to build vertical applications based on its framework, thereby establishing a Xiaohongshu standard in the image editing field. If a rich ecosystem of toolchains and plugins develops around FireRed, both within and outside the community, the cost for newcomers to disrupt it becomes exceedingly high.

Of course, FireRed, now in the spotlight, does not face a smooth path forward. One challenge is winning over user habits. Products like Doubao and Qianwen, backed by major tech firms, have already accumulated large user bases and strong brand recognition. Persuading users to switch from "using Doubao" to "using Xiaohongshu's built-in FireRed" requires not only superior technology but also carefully designed user interaction experiences and operational strategies.

Additionally, challenges remain regarding the model's ability to generalize across different scenarios. Currently, FireRed excels at image editing, but image generation is also a crucial part of content creation. The team has previewed future releases of new text-to-image model versions. This means Xiaohongshu's multimodal capabilities will soon be complete, but it also signifies entering into more intense competition with established ecosystems like Stable Diffusion and Midjourney.

Technical ethics and community governance are also long-term concerns for Xiaohongshu. Enhanced image editing capabilities bring increased pressure to mitigate risks associated with misinformation, AI-generated face swaps, and copyright infringement. Balancing creative freedom with content safety is a challenge Xiaohongshu must address concurrently.

It is worth noting that alongside the release of FireRed-Image-Edit 1.1, Xiaohongshu's Super Intelligence team had previously demonstrated breakthroughs in OCR—their compact 2B-parameter FireRed-OCR model surpassed giant models in document parsing benchmarks. This indicates that Xiaohongshu's multimodal strategy is not about isolated breakthroughs but involves systematic development of a full technology stack.

For Xiaohongshu, the release of FireRed 1.1 is more than a product update; it is an expansion of its identity. The platform is evolving from a content community into a provider of content creation infrastructure. In an era where AI is redefining creation, the platforms that master core generative capabilities are the ones likely to wield the power to define "beauty" in the next wave of competition.

免責聲明：投資有風險，本文並非投資建議，以上內容不應被視為任何金融產品的購買或出售要約、建議或邀請，作者或其他用戶的任何相關討論、評論或帖子也不應被視為此類內容。本文僅供一般參考，不考慮您的個人投資目標、財務狀況或需求。TTM對信息的準確性和完整性不承擔任何責任或保證，投資者應自行研究並在投資前尋求專業建議。

老虎證券

Xiaohongshu Deploys New Image Editing Model: Technological Breakthroughs and Ecosystem Ambitions Behind Major Update

熱議股票