TENCENT's Hunyuan Releases 0.3B Parameter Edge-Side Model with Minimal 600MB Memory Footprint

TENCENT's Hunyuan AI model team has officially launched an extremely compact model named HY-1.8B-2Bit, designed for consumer-grade hardware. This model has an effective parameter count of just 0.3 billion and occupies only 600MB of memory, making it smaller than many common mobile applications. It was created by applying 2-bit quantization-aware training (QAT) to the previously released small-scale language model HY-1.8B-Instruct. This process reduced the equivalent parameter count by six times compared to the original precision model. While retaining the full reasoning capabilities of the base model, HY-1.8B-2Bit achieves a 2 to 3 times improvement in generation speed on real edge devices, significantly enhancing the user experience.

The release of the HY-1.8B-2Bit model enables effortless deployment on edge devices. It represents the industry's first practical implementation of a 2-bit quantized model for edge-side applications. Furthermore, the model inherits the comprehensive "full-chain thinking" capability from Hunyuan-1.8B-Instruct. This allows users flexible application: it provides concise reasoning chains for simple queries and detailed, extended chains for complex tasks, letting users choose the appropriate mode based on their application's complexity and resource constraints.

To maximize the model's general capabilities, the Hunyuan team employed three key methods: data optimization, elastic scaling quantization, and innovative training strategies. For deployment, TENCENT provides the HY-1.8B-2Bit model weights in gguf-int2 format and bf16 pseudo-quantized weights. The actual model size is reduced by a factor of six to just 300MB compared to the original precision model, ensuring flexible use on edge devices. The model has also been adapted for computing platforms like Arm and can be efficiently deployed on mobile devices supporting Arm SME2 technology.

Testing on a MacBook with an M4 chip, with threads fixed at two, evaluated the first-token latency and generation speed across different context windows. Compared to the fp16 and Q4 gguf formats, HY-1.8B-2Bit demonstrated a 3 to 8 times acceleration in first-token latency for inputs up to 1024 tokens. In terms of generation speed under common window sizes, it achieved a stable acceleration of at least 2 times over the original model. Similar tests on a Dimensity 9500 platform showed a 1.5 to 2 times acceleration in first-token latency and approximately a 1.5 times boost in generation speed compared to the HY-1.8B-Q4 format.

By utilizing ultra-low bit quantization technology, HY-1.8B-2Bit enables flexible deployment of large language models on edge devices. It maintains performance comparable to INT4-PTQ methods while achieving efficient and stable inference on-device. Currently, the model's capabilities are still constrained by the supervised fine-tuning (SFT) training process and the inherent performance and robustness of the base model. To address this, the Hunyuan team plans to focus future efforts on technological pathways such as reinforcement learning and model distillation. The goal is to further narrow the performance gap between low-bit quantized models and full-precision models, thereby opening broader application prospects for deploying large language models on edge devices.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

TENCENT's Hunyuan Releases 0.3B Parameter Edge-Side Model with Minimal 600MB Memory Footprint

Most Discussed