Data & AI

Why training data yield rate matters in physical AI

A 3D object that features three interlocking, concentric rings that form a spherical, gyroscopic structure. Each ring is a different vibrant color: one is blue, one is yellow and one is red.

Key takeaways

  • Physical AI requires domain-specific data captured in your exact environment for your exact use case. Robots cannot generalize from generic data, which makes manual demonstrations and continuous real-world calibration unavoidable.
  • Eight hours of raw collection typically yields only two to four hours of training-grade data after sensor, sync and QA losses. Curation is the real bottleneck.
  • Humanoids are versatile but fragile. For repetitive industrial work, specialized arms or wheeled platforms deliver faster, more reliable ROI.

As labor shortages and hazardous environments challenge industrial growth, adopting physical AI has moved from an open question to an operational one: How do you accomplish it without adding unnecessary overhead and complexity? Similar to how an autonomous vehicle must perceive a stopped car, decide to brake and actually stop, all within milliseconds and without one system waiting on another, a robot's sensors, decision model and actuators have to operate in lockstep or the whole thing falls apart.

Digital AI learned by ingesting the internet. Physical AI can borrow a surprising amount from that same corpus — perception now rides on web-scale vision and vision-language models, so a robot doesn’t start blind. What the internet can’t supply is grounded action: the demonstrations, contact forces and embodiment-specific trajectories that teach a system not what the world looks like, but how to act in it. There is no web-scale corpus of movement and that’s the real constraint.

At our recent World Models Summit, TELUS Digital’s Vice President of AI Growth and Solutions, Sce Pike, sat down with an elite lineup of robotics and AI pioneers to map out the current state of physical AI.

World Models Summit panel included:

  • Myles Liu is director of business operations at Lightwheel, focusing on simulation infrastructure and curated egocentric data for VLA training.
  • Tory Smith is director of product management at Niantic Spatial, building foundation models for 3D reconstruction, visual localization and semantic understanding.
  • Ben Levin is director of robotics and physical AI data at NVIDIA, building the data stack behind Cosmos and GR00T.
  • Rajesh Radhakrishnan is vice president of autonomy at Serve Robotics, deploying autonomous sidewalk-delivery robots at production scale.

The panel represented an ideal operational workflow, running the gamut from the minds building foundational generalist models to the teams deploying autonomous hardware at full production scale. Together, they dove deep into the reality of data curation and the unpredictable ways humans interact with robots in the wild.

The data attrition problem

A major point of contention in modern AI is whether large language models (LLMs) can naturally evolve into generalist robotics models.

Current vision-language-action (VLA) models rely heavily on behavior cloning, learning from a dataset and cloning it onto the real world, Levin argued. However, he emphasized that the ultimate benchmark is real-world utility: "The key question is not if this is the right way from first principles to learn embodied intelligence, but will it do useful work in the real world?"

Building VLA models introduces a massive operational hurdle: data alignment. "To train a VLA model, you need a lot of effort,” noted Radhakrishnan. “You have to make sure that vision, action and language are completely aligned. Curating that data is incredibly expensive, it is the most difficult problem to solve right now in creating generalist models."

AI robotics training is often hindered by fragmented hardware and low-fidelity data capture. Much of today’s training data is collected on research robots not suited for production environments, and many systems rely only on visual feedback, making delicate or contact-rich tasks difficult to execute.

Liu walked through the experience many teams face when collecting training data for physical AI, such as training VLAs on egocentric hand-pose data. A 10% accuracy gap in the hand-pose model wipes 10% of usable data downstream. Collection-side failures (sensor dropouts, mis-synced modalities, missing pose metadata) take another 20%. QA rejections at the annotation step remove up to 20% more. Eight hours of raw collection becomes two to four hours of training-grade data.

Levin was direct about the stakes. "Teams will acquire massive datasets — 50,000 hours, 500,000 hours of data and immediately spin up hundreds of GPUs for pre-training,” he said. “They commit enormous computational resources without actually knowing if their data is any good. Then, after weeks of training, they discover the data quality is insufficient for their needs. Pre-training evaluation is very important to figure out what the data quality should be looking like and what kind of data you should be looking at before you put all those resources to work."

When high-quality data is secured at scale, it unlocks massive architectural freedom. For instance, in frontiers like Generalist AI's GEN-1, approximately 99% of the parameters are trained from scratch. While previously considered a wild choice, it represents a deliberate conviction that when you possess high-fidelity, high-volume data, you can push capabilities faster by maintaining complete control over the fundamental model architecture.

3D is the harder bottleneck

As an industry, we can simulate physics really well, making it ‘easy’ to train a robot to do a backflip in a digital world where only gravity and the floor matter. However, we can’t yet simulate the infinite, messy variety of the real world. A backflip is just about the robot’s own body, but clearing a cluttered work area without breaking a component requires generalization — the ability to handle the unpredictable.

Smith highlighted that the industry is deeply limited by the accessibility of high-quality 3D data at scale. While 2D imagery is ubiquitous on the web, robots operate across multiple camera angles and complex spatial extrinsics.

Interestingly, Niantic’s historical data collection revealed that messy, high-entropy crowdsourced video clips captured by everyday consumers on off-the-shelf smartphones actually performed better at reconstructing reality than perfectly audited, structured data collection pipelines. The noise, variable lighting and unpredictable motion of real-world captures forced the spatial models to build a more resilient understanding of physical environments.

To collect the messy, high-entropy noise of the real world at scale, you need a contributor base globally distributed enough to cover the long tail of environments and lighting conditions, and a capture stack that keeps multi-sensor streams synchronized at the point of collection. This contributor diversity is what protects you from a model that works only inside the conditions you sampled.

Deployment teaches what benchmarks cannot

Standard metrics like mean average precision (mAP) are great for academic research papers, but real-world fleet deployment introduces high-stakes liabilities that lab models cannot anticipate. While Levin highlighted that scaling up deployment is the ultimate test for generalist systems, Radhakrishnan pointed out that traditional benchmarks completely mask the operational risks: "My problem is not mean average precision. My problem is at the tails. I have corner cases where one safety issue means a public perception problem, a safety problem and an operational burden I need to be accountable for."

To prove why blind data scaling fails, Radhakrishnan shared two anecdotes from his deployments at John Deere and Serve Robotics, showing why data awareness matters:

  • A tractor that consistently disengaged in one specific field at night. The long-exposure night settings on the tractor's cameras turned ordinary flies buzzing in front of the lens into long, bright streaks across the image. But the validation dataset only contained a single, unimpactful sample of a fly at the very edge of a frame, it didn’t count.
  • At Serve Robotics, which currently operates a fleet of over 2,000 sidewalk delivery robots, the engineering team realized that human behavior around robots is entirely unpredictable. Local pedestrians quickly figured out that the delivery bots were hardcoded to stop safely when detecting a pedestrian. In response, teenagers began donning rollerblades, hooking ropes around the robots and using the autonomous delivery vehicles as personal sidewalk surfboards.

Deployment is often where unforeseen failure modes surface. Addressing this, Radhakrishnan notes, “You have to have this data-driven and data-aware mentality.” Ultimately, while production is where edge cases are discovered, your priority must be developing a dataset that is comprehensive enough to identify and address them prior to operational deployment.

Do you really need a humanoid?

One of the loudest debates in the industry centers on form factors. Business leaders must balance a humanoid’s long-term versatility against its immediate high maintenance.

Because our world from stairs to door handles was built for people, humanoid robots can step into existing workspaces without requiring a total facility renovation. In the near future, they will likely also be able to benefit from decades of human data (like video and motion capture) to help them learn.

However, they are power-hungry, mechanically volatile and incredibly complex.

"You can slice bread with a plasma cutter, but I wouldn't recommend it," joked Smith, addressing the overhype surrounding humanoid deployment.

For high-volume, repetitive industrial tasks, a specialized robotic arm or a wheeled platform is often the smarter investment. It is more reliable, has fewer parts to break and offers a significantly faster ROI.

Liu pointed out that underhyped, controlled indoor spaces, such as sterile processing departments in hospitals or structured retail fulfillment centers are far more ripe for immediate, successful model fine-tuning because the operational variables can be strictly managed.

Building the data foundation for true physical AI

As the robotics field matures, the nature of data annotation is shifting. Radhakrishnan noted that the industry is moving past low-level pixel masking or manual 3D bounding box placement to high-level behavioral labeling, essentially an RLHF framework for physical mechanics, where human experts supervise and critique the operational intent and safety of an agent's actions.

To scale these complex systems without drowning your engineering team in operational complexity, you need a dedicated data partner who understands the friction points of physical AI. Look for operational expertise in the people doing the collection, since that's where usable-hour conversion is won or lost. Spend the larger share of your budget on in-domain data for fine-tuning rather than on generic collection volume. And lean toward partners with real field operations, because the failures that benchmarks miss are the ones a fleet running in the wild will surface.

TELUS Digital was built to be that partner. The same teams that run production-grade pipelines for the world's leading AV programs now bring that rigor to robotics and world models, with over one million hours of video collection in progress, more than 70 global delivery centers and a global community of contributors for diverse capture across over 35 countries. The capability set spans:

  • Diverse pre-training coverage across egocentric and wrist-mounted video, manipulation, humanoid interactions and cross-embodiment data;
  • Multi-sensor capture across RGB, lidar, depth, IMU, force-torque and tactile, kept in sync;
  • World model and sim2real data including expert-led teleoperation, digital twin setups and long-horizon activity sequences;
  • High-context, physics-aware annotations through Ground Truth Studio and kinematics-aware semantic labels through Fine-Tune Studio;
  • Complex post-training support including VLA training with action justifications, chain-of-thought reasoning and explainability narratives.

Data is delivered in weeks, in batches timed to your R&D cycles, under one SLA, with one team.

Let your core engineering team focus on building revolutionary world model architectures and refining hardware mechanics. Let us build the flawless, data-aware foundation your fleet needs to operate safely in an unstructured world.

Explore our Physical AI & Robotics Data Solutions to accelerate your deployment flywheel today.

Frequently asked questions

Be the first to know

Get curated content delivered right to your inbox. No more searching. No more scrolling.

Subscribe now