Morgan Stanley highlighted in its latest research report that as multiple companies redirect resources and attention toward physical/embodied AI and robotics technology, a "photon race" for real-world visual data is quietly emerging. Against this backdrop, the firm assigns Tesla Motors an "Overweight" rating with a target price of $410.
Tesla Motors, Meta, and Figure AI are actively positioning themselves in visual data collection and utilization through different approaches. The firm emphasized: "You can have all the computational resources in the world, but without visual data, you cannot train Vision-Language-Action (VLA) models."
Morgan Stanley points out that visual data has become the scarcest and most strategically valuable resource in AI training. The firm illustrates the value of visual data through a vivid analogy: a 600-pound bluefin tuna swimming far from shore has zero value without fishing boats and equipment; however, with proper harvesting capabilities, its value could reach $3.1 million. Similarly, the world's visual data has zero value if it cannot be captured and processed, but if collected and processed at scale, its value becomes immeasurable.
**Tesla Motors: Shifting to "Pure Vision" Training**
In May 2025, Tesla Motors' former Optimus lead released a series of videos demonstrating Optimus autonomously completing tasks by learning from human videos. These videos were shot from a first-person perspective (camera positioned on the demonstrator), but the ultimate goal is to transition to third-person perspectives obtained through "random cameras" and internet videos.
"Tesla Motors reportedly will shift to a 'pure vision' approach for pre-training Optimus, no longer using teleoperators wearing motion capture suits and VR equipment, instead recording videos of workers performing tasks as training data."
This transition marks a significant adjustment in Tesla Motors' training paradigm, highlighting the central role of visual data in robotic behavior imitation and generalization capabilities.
The firm anticipates that visual data will not only be used for training models in the future but also for constructing "robot training gyms" (simulation environments), enabling iteration of billions of scenarios in the digital world.
Tesla Motors vehicle owners are not only moving through physical space while driving but also "playing video games," feeding data to the simulated world to train the latest FSD models. Meta glasses users are teaching models how to play piano, knit, pour coffee, or take out trash.
Morgan Stanley emphasizes that visual data is the core resource for training next-generation AI models, and its value is being redefined. Tesla Motors, Meta, and Figure AI are advancing data collection strategies through different pathways, from vehicles and glasses to real estate, all competing for leadership in this "photon race."