Folding clothes is the first lesson in teaching robots to do household chores at Qianxun Intelligence. In an office building in Beijing's Haidian district, data collectors sit in front of robotic arms, picking up, aligning, folding, and putting down items - each action repeated hundreds of times, all to help robots learn to perform household tasks "like humans."
Similar training is taking place simultaneously across different areas of Beijing: To the west, at the Shijingshan Humanoid Robot Data Training Center, hundreds of robots are learning actions like opening doors, grasping objects, and flower arranging in "nine-year compulsory education" training zones and "robot university" scenario areas. To the south, the Beijing Humanoid Robot Innovation Center in Beijing Economic-Technological Development Area (known as "Beijing Yizhuang") has created 1:1 replicas of kitchens, living rooms, supermarkets, and gas stations, building an immersive data collection factory with hundreds of data collection units distributed throughout the building, including humanoid, wheeled, and robotic arm systems.
Investigations reveal that multiple Beijing enterprises and institutions have established data collection centers, including Zhiyuan Research Institute, Galaxy General, Beijing Humanoid Robot Innovation Center, Xinghaitu, and Qianxun Intelligence, with team sizes ranging from 30-40 to over 100 people.
Currently, embodied intelligence is in a "hundred schools of thought" stage of technological exploration with diverse approaches, but one consensus is becoming increasingly clear: high-quality data is key to whether robots can leave laboratories and truly enter society.
Unlike large language models that rely on massive text corpora, embodied intelligence models must learn multimodal data including actions, language, and vision in real or simulated environments - like teaching a child to play ball. You can't just explain; you need to demonstrate actions, correct mistakes, and provide reinforcement for intelligence to gradually emerge.
Today, high-quality embodied intelligence data has been assigned clear economic value: it can be traded, receive government subsidies, and even become important leverage for corporate financing, application expansion, and driving complete machine sales.
The government has introduced incentive mechanisms like "data vouchers," while enterprises continuously explore different aspects of data production, annotation, simulation, and synthesis, attempting to build moats through unique data formulations.
More importantly, this is no longer a single company's breakthrough battle, but a systematic experiment by an entire city. For example, Beijing is attempting to leverage the entire embodied intelligence industry chain through multi-dimensional coordination of policies, scenarios, and mechanisms, using data as a fulcrum to help robots enter the real world. Shanghai, Tianjin, and other cities are also establishing large-scale data collection centers.
**Robot "Schools"**
Qianxun Intelligence has transformed an entire floor into an orderly data factory. There are no cubicles or meeting rooms, replaced instead by rows of robotic arms and operational guidelines posted on walls. The left wall displays safety operation rules, while a small blackboard on the right updates daily work hours, completion progress, and accuracy rates of data collectors. A large screen real-time displays key indicators including collection rates, error curves, and system stability.
Basic actions are completed by data collectors, while complex operations are handled by engineers wearing VR equipment for remote control, simulating tasks like handling, obstacle avoidance, and placement.
A Qianxun Intelligence executive explained that when initially training the clothes-folding action, it took half a year just to complete the process from fabric recognition to path planning. Previously, training a new action required 600-700 high-quality data points; now it needs fewer than 100, improving training efficiency by nearly 70%. "The robot model's growth is like a child developing from age three to five - learning faster and more stable."
Currently, Qianxun Intelligence can collect over a thousand action data points daily, forming monthly callable, combinable, and reusable capability libraries. This system of "self-collected data, self-controlled hardware, self-tested models" has become core competitiveness in its financing process.
Since its establishment in early 2024, Qianxun Intelligence has secured nearly 600 million yuan in financing, with investors including JD.com, Xiaomi affiliates, CATL affiliates, and Middle Eastern capital. These investors not only provide funding but also open their real scenarios - factories, warehouses, logistics parks - for Qianxun Intelligence deployment and testing.
Unlike precise collection in office buildings, Beijing Humanoid is more like an immersive experimental stage. Two floors are replicated 1:1 as kitchens, bedrooms, living rooms, tea rooms, and even gas stations, supermarket shelves, and factory production lines for life and commercial scenarios. Robots learn operational tasks like opening refrigerators, pouring tea, restocking, and loading/unloading goods. These actions must be accurate while remaining as natural as possible, close to human habits.
Li Guangyu, head of embodied data at Beijing Humanoid, explained that organizing a refrigerator involves breaking the task into multiple sub-actions: opening the door, recognition, grasping, placement, closing the door... Different refrigerator brands have slightly different structures, and bottled cola might be placed in the refrigerator compartment, drawer, or door storage compartment - each position affects the robot's operational path, requiring coverage of various variants to ensure model generalization capability.
Beijing Humanoid divides collection tasks into two categories: highly reusable general action scenarios, prioritizing coverage of kitchens, living rooms, offices, and other spaces; and enterprise-customized scenarios, such as collecting freezer operation procedures for home appliance companies or recording standard restocking actions for retail brands. Collecting standard operating procedures for just one brand of freezer might require thousands of hours.
In terms of capacity, Beijing Humanoid has achieved monthly collection of over 10,000 hours of action data, ranking among the first tier of national collection centers.
Li Guangyu stated: "We focus not on data volume, but on whether quality serves intelligence emergence. The same 10,000 hours of data, organized differently, can result in vastly different model effects." The team is advancing differentiated supplementary collection, analyzing model weaknesses during training and optimizing directionally to more efficiently support generalization training.
More valuable long-term are "data recipes" formed around different industry scenarios. This refers to completing customized data collection needs based on enterprise business processes, operational standards, and working environments, containing industry enterprise technical know-how. This is why leading embodied intelligence companies are competing to get robots working in factories - the more enterprise types they collaborate with, the richer their data recipes become, making trained models more likely to be practical and becoming important assets when discussing cooperation with clients and valuations with capital.
**Beijing Yizhuang's "School District" Experiment**
In August 2025, at the Beijing World Robot Conference exhibition area, there was an "Embodied Intelligence Data Collection Map" showing nearly 100 real collection points distributed across pharmacies, libraries, hotels, logistics parks, and other public and commercial spaces, forming a dynamically operating human-machine collaborative network.
This is not a conceptual diagram, but part of Beijing Yizhuang's ongoing "Embodied Intelligence Social Experiment Plan." In this plan, the entire urban area functions like a real data factory for embodied intelligence.
In July, at a 7Fresh supermarket in Beijing Yizhuang, Beijing Humanoid's "Embodied Tiangong" robot was conducting restocking training between shelves. Two engineers stood beside it - one holding remote control equipment to operate it, another recording data and action performance. They collect over 20 micro-tasks daily, divided into dozens of sub-actions.
Dense foot traffic creates some interference with collection operations. One engineer noted: "There are many people taking photos and watching, finding robot training very novel."
Li Guangyu introduced that compared to building scenarios, robots collecting data in real spaces like supermarkets and hotels have three significant differences. First is the highest environmental fidelity - no need for replication, directly executing operations according to position standard operating procedures; second is dense foot traffic with many observers, placing higher requirements on robot stability; third is stricter on-site safety management requirements. Although no zoned operation lines are established, all actions must be controllable and guaranteed, currently still mainly relying on on-site remote operation.
Similar locations are gradually expanding. According to Beijing Economic Development Zone Management Committee planning, real-world locations will expand to thousands, with data pool construction reaching PB (petabyte) levels.
Simultaneously, Beijing Yizhuang issued "Several Measures to Promote Innovative Development of Embodied Intelligence Robots," formally confirming data as an important production factor. It clearly proposes providing 100,000 yuan rewards for each recognized data collection benchmark training site; up to 2 million yuan funding support for high-quality datasets constructed by enterprises; annual distribution of 100 million yuan in "data vouchers," allowing enterprises purchasing data products (such as datasets, platform interfaces) to use data vouchers for proportional subsidies, with annual maximum subsidies not exceeding 1 million yuan per purchasing entity.
The core transformation of this mechanism lies in shifting from past subsidies for robot bodies to using data as incentive targets, encouraging enterprises to participate in public data ecosystems of co-construction, co-collection, and co-use.
Enterprises are also responding. In August, Xinghaitu Technology, settled in Beijing Yizhuang, released China's first open-scenario real-machine dataset GalaxeaOpen-WorldDataset and simultaneously announced open-sourcing its self-developed model G0. This dataset comes from 50 typical scenarios including real homes and offices, with total duration exceeding 500 hours, covering 234 tasks, over 1,600 objects, and 58 operation skills. Downloads exceeded 80,000 within a week of release.
Over the past ten months, Zhao Xing, assistant professor at Tsinghua University's Institute for Interdisciplinary Information Sciences and chief scientist at Xinghaitu, has been almost entirely at data collection sites, personally participating in frontline data engineering, often adjusting parameters late at night. He believes the biggest bottleneck in embodied intelligence development is the lack of high-quality data.
Unlike algorithms, data collection is not sudden inspiration but continuous, tedious, labor-intensive production activities - from training data collectors and solving equipment and network emergencies to data uploading, cleaning, and annotation, all requiring personal involvement.
Zhao Xing emphasizes collecting in real scenarios like homes, hotels, factories, and supermarkets to cover as broad a task space as possible. The significance of open-source datasets is twofold: first, promoting industry formation of unified standards for algorithm comparison; second, building developer ecosystems to help research institutions and enterprises shorten implementation cycles.
Established for over two years, Xinghaitu has completed nearly 1.5 billion yuan in financing, with Meituan and Capital Today leading investment, followed by Beijing Robot Fund and Yizhuang Guotou.
Additionally, a "robot school" oriented toward the future has been built in Beijing Yizhuang. This is an embodied intelligence data training base created by Beijing Humanoid, also China's first embodied intelligence training platform based on real scenarios. The base plans to complete layout of over 20 real scenarios by year-end, launching large-scale data collection.
Beyond production functions, it also undertakes data collector training and certification, exploring vocational education systems and gradually establishing industry talent standards. This model also has potential for replication and promotion in multiple locations.
**Human Teachers Behind the Scenes**
Just as artificial intelligence is called "how much artificial, how much intelligence," embodied intelligence training similarly highly depends on human labor. At the industry frontline, thousands of data collectors undertake teaching tasks. This type of work is now collectively called embodied intelligence trainers.
It sounds like a prestigious profession of the digital age, but is actually the most primitive physical labor. They must input dozens to hundreds of action data points daily, with task scenarios including folding clothes and cleaning surfaces, sometimes even simulating a person busy in a kitchen all day - walking back and forth, repeatedly bending, moving and organizing objects.
Before employment, data collectors must test action adaptability, wearing VR equipment for bending, lifting, rotating, and other tests. This process easily causes dizziness, with many unable to last ten minutes, resulting in over 50% elimination rates.
More implicit thresholds are hidden in recruitment details. A human resources manager at a data collection company indicated they prefer applicants 160-170cm tall with strong action coordination and standard body types - because unstable posture affects general model training. Some recruitments even explicitly state restrictions: males not exceeding 65kg without beer bellies; females not exceeding 55kg.
Even after successful employment, data collectors' daily work is not easy. In most collection centers, a training chain is divided into three roles: frontline action collectors who demonstrate and input actions, with daily collection volumes of 50-200 items, experienced ones reaching thousands; then data auditors, each processing thousands daily, with groups handling millions annually. These two personnel types are mostly hired through outsourcing.
Above them are algorithm engineers who train models based on data and repeatedly verify and adjust parameters on-site, mostly with educational backgrounds in computer science or autonomous driving fields.
Many algorithm engineers also need to understand hardware debugging, with desks holding monitors on one side and different types of robotic arms plus maintenance tools on the other, ready to completely disassemble robots at any time.
Although all are trainers, these three job categories differ significantly in work nature, skill thresholds, and compensation structures. Frontline collection positions typically earn 5,000-6,000 yuan monthly; audit positions can reach 80,000 yuan annually; core trainers participating in model training can earn 150,000-200,000 yuan annually. Algorithm engineers start at 20,000 yuan monthly, with those proficient in data synthesis and other technologies reaching 100,000 yuan, plus equity compensation incentives.
To extend career paths and reduce personnel turnover, some data centers are attempting to select employees "with data intuition" from collectors to participate in real-machine parameter adjustment and process design, even promoting them to project managers. Such recruitment demands are growing at 2-3 times the speed.
Meanwhile, technology continues expanding geographical boundaries of positions. At Shijingshan Humanoid Robot Data Training Center, remote collection systems are online, allowing operators not in Beijing to control robots for data collection tasks by wearing professional equipment from distant locations. Young people in third and fourth-tier cities can also join as remote workers.
This remote mechanism can be deployed overseas, reducing operational costs of data collection. The center currently has over 100 dual-arm robots in use, mainly using exoskeleton and VR remote operation equipment - compared to motion capture systems costing hundreds of thousands of yuan, offering greater flexibility and economic applicability.
**Disagreements on "Textbook" Writing Methods**
Industry consensus has gradually clarified: data is the core element of embodied intelligence, but technical routes are rapidly diverging around questions of what constitutes high-quality data, how to collect it, and how to use it efficiently.
One path emphasizes collecting real-machine data in the real world, accumulating general experience; another path focuses more on the efficiency and cost advantages of synthetic data, hoping to iterate quickly in early model training stages. Different enterprises' development stages, funding capabilities, and target scenarios result in different requirements for data quality, efficiency, and generalization capability.
Wang He, assistant professor at Peking University and founder of Galaxy General, is a representative of the synthetic data route. He explained that real-machine data collection itself is too slow and expensive. Taking Tesla as an example, training robots to complete battery sorting requires a 40-person team to remote control for months, accomplishing only one skill. In reality, robots need to master thousands of operations.
Galaxy General chose a "virtual-real combination" paradigm - mainly synthetic data with real data as supplement, achieving balance between model training efficiency and generalization capability. Galaxy General uses billion-level synthetic data for end-to-end training, relying only on minimal real data for generalization fine-tuning.
Wang He gave an example: using only 200 real data points, Galaxy General's robot learned to grasp drinking water in sequence within an afternoon and could generalize to different brands of bottled beverages. This efficiency comparison represents month-scale magnitude differences.
Wang He doesn't deny the value of the real-machine data collection trend, but believes the key question isn't how much data was collected, but whether this data can deliver value? Can it actually make robots work? Are costs appropriate?
He judges that over the next three years, humanoid robot mass production speed and autonomous application implementation scale will grow at double or even triple rates. Finding the most suitable scenarios and most cost-effective high-quality data generation methods is important.
Established for two years, Galaxy General has received two funding rounds, completing 1.1 billion yuan financing in June, setting a record for the largest single financing in China's embodied intelligence track.
Li Guangyu mentioned that Beijing Humanoid also uses synthetic data in actual training. The current industry ratio is generally about 9:1 - simulated data accounting for 90%, real-machine data 10%, achieving better cost-output balance.
Beyond real-machine data, Beijing Humanoid is simultaneously building diversified data systems, including high-fidelity synthetic data and human video data, exploring advanced training paradigms like world models, human-in-the-loop training, and robot autonomous learning to improve overall data scale and training efficiency.
He Xiaodong, Senior Vice President of JD Group and Vice President of JD Explore Academy, explained that the value of combining synthetic and real data has precedents. In autonomous driving, many companies initially tried to rely on simulation platforms to batch-generate data and drive model evolution. Tesla's practice shows that starting with L2 mass-production vehicles and relying on large-scale real driving data accumulated through long-term operations to continuously iterate models - once the data flywheel spins up, technological progress becomes more apparent. Simulation can accelerate verification, while real-scenario data affects longer-term performance issues.
He believes embodied intelligence enterprises should quickly get robots into the real world, participating in work and production.
These voices show that real-machine collection and synthetic simulation are not mutually opposed, but complementary combinations in technical routes. For enterprises with different tasks, different computing resources, and different business objectives, finding their own path is more critical.