Tencent Launches Consumer GPU-Compatible 3D Mixed-Reality Model: Is Mass 3D Adoption on the Horizon?

Deep News

Aug 20

On August 15th, Tencent Holdings Limited (00700.HK) mixed-reality team released a Lite version of their 3D world model. Compared to the previous 26GB VRAM requirement, this update utilizes dynamic FP8 (8-bit floating-point format) quantization technology to reduce memory requirements to under 17GB, enabling consumer-grade graphics cards to run the system smoothly.

Previously, Tencent's 3D world model FP32 version could preserve complete details but required extremely high VRAM usage—with parameters potentially exceeding one billion, typically requiring GPUs with large VRAM capacity to improve inference speed, making it incompatible with consumer-grade graphics cards.

Simply put, FP32, FP16, and FP8 represent different "precision levels." The previous high-precision FP32 technology achieved extremely high accuracy restoration but consumed substantial VRAM and may have retained unnecessary details (such as background sky textures that don't require such meticulous rendering).

The core of this dynamic FP8 quantization technology lies in its ability to monitor data distribution during model runtime in real-time and dynamically adapt to different modules: most critical areas use FP16 precision, while non-critical parts like background textures are dynamically adjusted to FP8 precision. This technology significantly reduces VRAM usage, and although it appropriately lowers precision in some areas, it enables individual users to easily utilize 3D world models.

Tencent's 3D world model represents the industry's first open-source editable world generation model, capable of generating complete, editable, and interactive world models based on user-provided images or text information, directly applicable to game development, special effects production, educational simulation, and other scenarios.

Compared to Tencent's previous 3D model AI generation functionality, the newly launched 3D world model generates more comprehensive content, covering environmental styles, indoor and outdoor scenes, lighting rendering, and multiple other factors. Traditional 3D scene development is extremely time-consuming, with a single major building scene potentially taking weeks or longer, while this one-click generation approach brings efficiency improvements that completely exceed user expectations.

So how does the mixed-reality 3D world model rapidly generate 360° immersive visual spaces for such complex scene development? From the model architecture of Mixed-Reality World Model 1.0, panoramic world image generation technology serves as a unified agent system connecting text, images, and worlds, first generating panoramic images of the initialized world to achieve 360° full-coverage scenes.

Subsequently, the system deconstructs the entire 3D world into different clear hierarchical levels, such as foreground and background, ocean and ground, ground and sky, then performs 3D world reconstruction based on these levels, ultimately forming the 3D world model.

Compared to traditional 3D scene development where every detail requires meticulous crafting and consumes substantial time and human resources, this one-click generation approach not only saves considerable time but also outputs standardized navigable 3D Mesh assets compatible with Unity, Unreal Engine, and other tools.

Moreover, the precision of generated content has reached directly usable levels: attention areas in the foreground present detailed rendering, background and foreground separation is adequate, with no unclear boundaries or blurred lighting effects.

However, experiencing Tencent's 3D world model on their official website reveals it cannot fully restore all requirements from text descriptions, only restoring approximate scene requirements, lighting colors, and foreground area details. For example, text requirements mentioning mechanical worlds and robots are not presented in generated scenes. The system only extracts vocabulary related to constructing general world scenes, such as cyberpunk wasteland style and red sunset in the sky, then separates foreground and background—deconstructing "abandoned amusement park" as foreground content and red sunset as background sky content, then rebuilding 3D world scenes based on these levels, meaning it only restores approximate scene requirements.

It's evident that Tencent's 3D world model currently cannot satisfy users' personalized needs, though it can initially construct foreground, background, and simple scene details, saving considerable time in game development and other work.

Additionally, such user-requirement-generated 3D world models offer significant playability for ordinary users. Direct 3D Mesh asset output brings format unification and reduced learning costs. When AI can complete scene deconstruction and 3D construction work, user initiative becomes the sole variable determining generated scenes.

Tencent's decision to popularize the mixed-reality 3D world model to consumer-grade graphics cards has a clear purpose—attracting numerous developers and creators into the "Tencent Mixed-Reality 3D" ecosystem. This model supports full-process content generation from 3D models to 3D world scenes, enabling users to create their own virtual worlds.

Currently, AI large models supporting 3D model generation are numerous, including Tripo AI, Meshy AI, GENIE, and others. However, multiple players clustering in the 3D transformation track has led to highly homogenized product functionality, indirectly reflecting that "bringing real scenes into virtual worlds" has become core contested functionality for manufacturers.

Among these AI tools, Silicon Valley startup VAST's AI 3D foundation model Tripo AI, released in 2024, stands out with its unique product structure. Unlike Tencent's Mixed-Reality 3D targeting broader users, Tripo AI positions itself more toward professional creators: upon entering the platform, users can directly generate 3D models through text or images, with relatively rich adjustable parameters—not only supporting texture generation functionality common to current mainstream AI 3D models but also automatically splitting model components for individual editing of each disassembled part, even supporting basic animation binding and demonstration for model components, though occasional component deformation issues occur during demonstrations.

Overall, Tripo AI represents a functionally mature AI 3D tool adaptable to multiple scenarios.

Similarly launched in 2024, Meshy AI (created by a domestic team), while also supporting direct 3D model generation through text and images, has its core advantage in more comprehensive community functionality: users can browse other creators' 3D model works within the community, with clear platform categorization of models and annotations of interaction volume, likes, 3D printing support, and other key information. This design enables novice users to directly download ready-made 3D models while enhancing community dissemination and activity levels.

The GENIE tool launched by Luma AI, besides supporting text-to-3D model conversion and multi-format exports (such as OBJ, FBX) for different scenario adaptation, features API interfaces as its highlight—users can directly convert video content into 3D models through these interfaces, creating differentiated competitive advantages.

It's evident that the aforementioned products have each broken through homogenized competition with their unique characteristics, and Tencent's Mixed-Reality 3D is no exception. Although its 3D model generation functionality hasn't significantly distanced itself from other tools, "high free quotas" represent its core advantage: on the Mixed-Reality AI 3D official website, each user can freely generate 20 models daily, with additional attempts available through friend sharing after quota exhaustion.

This "quantity-for-users" promotion strategy has proven quite successful. Before the 3D world model Lite version release, community model downloads reached 2.3 million, becoming one of the world's most popular 3D open-source model platforms.

Tencent's launch of the consumer-grade graphics card-compatible Mixed-Reality 3D world model Lite version will undoubtedly attract more creators to join its ecosystem. User base growth will further drive feedback iteration and application scenario expansion: taking currently popular VR glasses as an example, 3D world model files exported by Mixed-Reality 3D can be directly imported for use, with users only needing VR equipment to immerse themselves anytime in virtual scenes they create, achieving ecosystem and hardware synergy. Simultaneously, AI 3D foundation models enable ordinary users to easily create highly customized 3D models, forming collaboration with 3D printers.

More importantly, AI 3D transformation's nearly "zero learning cost" characteristic is driving rapid penetration across industries: in architectural planning, interior design, e-commerce display, and other scenarios, 3D visualized content is more comprehensible than text or traditional blueprints, with staff requiring no complex learning to output scene content, significantly reducing repetitive modeling time. This "virtual model + physical industry" synergy both enhances user stickiness and generates user belonging through highly customized content—various trends indicate 3D models will inevitably move toward mass adoption in 2025.

Future AI 3D models will further integrate professional scenario models and creative styles, attracting more vertical users through subdivided fields and usage scenarios, continuously expanding ecosystem boundaries and penetrating various lifestyle scenarios. This represents the core significance of this 3D model democratization wave—in the current fusion of reality and virtuality, empowering everyone with 3D virtual world construction capabilities.

However, online discourse persistently suggests that 3D model proliferation poses unemployment risks for 3D modelers. This viewpoint lacks merit. Undeniably, tools capable of rapidly generating 3D models will impact the industry. AI models' "fast and efficient" advantages are indeed difficult for humans to match, but as previously mentioned, current AI 3D models cannot achieve true user personalization—their generated products essentially remain "replicated content" based on large model learning data.

Such content lacking personality ultimately cannot become excellent works. Whether in game modeling or architectural design, truly memorable creations are always those with unique craftsmanship: details repeatedly refined by 3D modelers, ingenious considerations carefully tailored to user requirements.

Therefore, with current AI 3D model capabilities, completely replacing 3D modelers is essentially impossible. Conversely, as tools capable of efficiently executing repetitive instructions, they're more suitable as "auxiliary assistants" for improving modeler efficiency.

Upon reflection, this "AI-assisted creation" model has long penetrated various industries. However, limited by content homogenization issues, AI often remains confined to "repetitive basic construction" phases. This explains why despite increasingly convenient and widespread AI writing tools today, original content creation persists—truly profound, warm, quality articles will never lose their luster due to AI existence.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

Tencent Launches Consumer GPU-Compatible 3D Mixed-Reality Model: Is Mass 3D Adoption on the Horizon?

Most Discussed