Why is real-world mileage irreplaceable if simulation is improving so fast?

While simulation accelerates iteration and is the only practical way to test rare or dangerous scenarios at volume, it does not replace real miles for two reasons. First, the unknown unknowns are unknown. If you have not thought to simulate an elephant in front of the car, no amount of simulation hours will surface that case. Second, simulators model the driving environment, not the full system. GPS or GNSS dropouts, sensor calibration drift, navigation stack faults and cross-component failures live outside the driving model and only surface when the full vehicle runs in the field.

Where should an OEM put its evaluation effort when choosing an AV partner?

The functional safety paperwork is necessary but not differentiating. The differentiating signals are: Talent density (a small number of companies have the people who can do level-four); Track record on previous problems (how the partner systematically surfaces and closes edge cases, not just how many they have closed) and Transparency (whether the partner publishes intervention rates and incident data the OEM can independently evaluate). Below that, automotive safety integrity level (ASIL) alignment and validation methodology are the technical conversations worth having.

What kind of training data actually moves the needle on the long tail?

In-domain edge-case data, captured across the geographies, weather conditions, lighting environments and demographics your fleet will see in production. Curated coverage of the 0.1% matters more than generic volume. The same logic applies to in-cabin data for driver and occupant monitoring, where demographic breadth in the contributor pool is what protects the model from failure modes that only appear once a system is deployed at scale.

Data & AI

The sim-to-real gap in autonomous vehicles only closes with real-world data

Posted June 22, 2026

Why Physical AI Needs Real-World Autonomous Vehicle Data

Key takeaways

While generative world models can simulate rare events on command via simple text prompts, real-world mileage remains irreplaceable for identifying system-level hardware failures and unexpected human behaviors.
The industry is moving from rigid, modular stacks to end-to-end embodied AI to eliminate restrictive human priors, such as a system ignoring an unclassified object like a soccer ball before a child runs into the street.
TELUS Digital's fully managed field operations programs handle end-to-end data collection infrastructure — from vehicle procurement and sensor calibration to multi-region test deployments — freeing core engineering teams to focus on model development.

Getting an autonomous vehicle to operate 90% of the time is straightforward. The next 9% takes longer. The 0.9% after that takes longer still.

The unknown unknowns — the scenarios no one on the team has thought to simulate — live in the decimals. They are the difference between a computer vision and pattern recognition (CVPR) demo and a production fleet that earns revenue without putting itself on the front page of a newspaper.

At our recent World Models Summit, technology leaders from Waymo, Wayve and PACCAR sat down to unpack how autonomous fleets across passenger transport, heavy trucking and end-to-end AI are systematically hunting down the long tail of edge cases to establish a minimum level of safety. The conversation featured three practitioners at the sharp end of that problem:

Paul Konasewich, director of partnerships at PACCAR, bringing the heavy truck OEM perspective (Peterbilt, Kenworth, DAF).
"Max" Chiyu Jiang, tech lead manager and research scientist at Waymo, leading onboard and offboard AI model development.
Rudy Seville, OS, Kernel and Runtime Platform team lead at Wayve, pioneering end-to-end embodied AI software for global automotive manufacturers.

Their collective takeaway: simulation accelerates iteration. It does not replace real miles.

Telling a rare scenario apart from a truly novel one

The advent of frontier world models has supercharged simulation capabilities. Jiang from Waymo highlighted their organization’s use of generative simulators, built on top of world models like Google DeepMind’s Genie3. These let engineers generate almost any scenario from a prompt, including how a robotaxi reacts to a loose elephant wandering into the road, no zoo visit or rented animal required.

But the panel was firm that simulation is a supplement, not a substitute, for real mileage. The core limitation of simulation: If a scenario is an unknown unknown, you don’t know to simulate it in the first place. As Jiang put it, "If you don't know what you're going to simulate, how do you even simulate it?" The inspiration for what to test tends to come from the field, not the lab. Furthermore, simulated environments rarely capture the cascading hardware and system-level degradations that happen in the wild.

When tackling edge cases, engineering teams often struggle to balance foresight with real-world experience. Waymo addresses this by utilizing a dual-layered framework that splits engineering efforts into proactive system design and reactive field adaptation.

The first is proactive systems engineering, run by a dedicated team that enumerates possible failure cases and works through structured testing before deployment. Jiang traced the mindset to aerospace: "When you launch a rocket to Mars, you only get one shot. You don't have historical data to rely on, so you must enumerate every conceivable failure mechanism." For a first-of-its-kind mission you cannot lean on prior field experience, so you reason through failure modes in advance. Anticipate before you can observe.

The second direction is reactive field adaptation, which kicks in once the fleet is actually on the road. Here the system learns from real exposure. When a Waymo is aggressively cut off on the highway, the onboard system must process the interaction dynamically, sometimes executing pre-programmed maneuvers and other times utilizing learned behaviors from field experience to honk or brake safely. While proactive enumeration covers what you can foresee, reactive adaptation captures what only the world can show you.

Reactive scaling, however, requires exceptional caution. Code cannot be pushed to a massive fleet with a single click; instead software updates are throttled across small operational domains to prevent localized anomalies from propagating at scale. However, even a perfectly cautious software rollout only tests the driving model. It does not exercise the rest of the vehicle. Standard simulators concentrate on perception and planning, Seville noted, and tend to skip the physical system around them:

What happens when the vehicle experiences a sudden global navigation satellite system (GNSS) or global positioning system (GPS) dropout in a dense urban canyon?
How does the entire integrated system handle a localized hardware performance lag?

The underappreciated scaling bottlenecks

As autonomy scales toward millions of commercial rides per week, consumer perception is rapidly shifting from viewing the technology as magical to treating it like a routine trip in an elevator. Despite this rapid normalization, two massive scaling bottlenecks remain underappreciated by the general public:

Hyper-localized behavioral adaptation

Autonomy stacks navigate regional human cultures. Seville noted that driving dynamics shift radically across borders. Furthermore, entering any new operational domain presents hyper-local behavioral anomalies. As the panel joked, mapping an autonomous fleet in Miami means the system must adapt to unpredictable events like "Spring Break" chaos. A vehicle must dynamically learn local driving etiquette, regional infrastructure configurations and unique pedestrian behaviors on the fly. Models optimized for the predictable lane markings of one city frequently face immediate operational friction when exposed to the cultural entropy of another.

Physical operations and logistics

While high-level lane keeping and highway cruising are largely solved problems, the harder, less-discussed work is the physical choreography of deployment. As Konasewich noted, for heavy-duty freight trucks and passenger cars alike, "pickup and drop-off is harder than you think." Backing a heavy freight truck into a crowded loading dock, or placing a robotaxi precisely at a busy airport curb, demands localized, highly specific spatial reasoning that broad driving models still struggle to execute cleanly.

Building the data infrastructure for autonomous fleets

Solving the unknown unknowns requires massive volumes of highly diverse, precisely structured training data. To scale without drowning internal engineering teams in data management overhead, autonomy programs require robust pipelines that capture real-world entropy across different countries, weather conditions and unexpected human behaviors.

For example, when a leading multinational provider developing core components for global automakers needed to validate its next-generation ADAS systems, they partnered with us. The project demanded an immense volume of real-world sensor data, video feeds and vehicle diagnostics. Testing in controlled environments was insufficient; the client faced critical operational risks, including sensor drift, data gaps, complex real-world environmental noise and a complete lack of standardized protocols across multiple global stakeholders.

To address this, we delivered a fully managed, end-to-end FOT deployment. We took over complete responsibility for vehicle procurement, preparation, storage, maintenance and security. To eliminate sensor drift and data inconsistencies before they could corrupt the training loop, our technical teams directly installed and continuously calibrated the vehicles' complex onboard sensor equipment and lidar systems.

To maximize coverage of hyper-localized edge cases and regional driving habits, we orchestrated optimized routes spanning a total of 157,000 km across diverse driving conditions in over 34 major cities across South Korea, the European Union and the United States. Each test vehicle was staffed by a specialized two-driver operational team to ensure continuous, uninterrupted data acquisition. Cloud-based infrastructure handled real-time location tracking, vehicle diagnostics and 7 TB of data throughput per vehicle per day — with curated, annotated real-world driving scenarios feeding directly into the client's ADAS model training pipeline.

Fully-managed field operations like the above lets your core engineering teams focus on refining model architectures and optimizing physical hardware. Partner with TELUS Digital to build a flawless, data-aware foundation that allows your autonomous fleet to navigate the unstructured world safely.

Insights Overview

Categories

Industries

Resource Types

Glossary

The sim-to-real gap in autonomous vehicles only closes with real-world data

Key takeaways

Telling a rare scenario apart from a truly novel one

The underappreciated scaling bottlenecks

Building the data infrastructure for autonomous fleets

Frequently asked questions

Be the first to know

Related insights

Why training data yield rate matters in physical AI

Why training data yield rate matters in physical AI

A technical primer on world models

A technical primer on world models

Physical AI training data: What good pre-training and post-training datasets look like

Physical AI training data: What good pre-training and post-training datasets look like