The sim-to-real gap in autonomous vehicles only closes with real-world data

Key takeaways
- While generative world models can simulate rare events on command via simple text prompts, real-world mileage remains irreplaceable for identifying system-level hardware failures and unexpected human behaviors.
- The industry is moving from rigid, modular stacks to end-to-end embodied AI to eliminate restrictive human priors, such as a system ignoring an unclassified object like a soccer ball before a child runs into the street.
- TELUS Digital's fully managed field operations programs handle end-to-end data collection infrastructure — from vehicle procurement and sensor calibration to multi-region test deployments — freeing core engineering teams to focus on model development.
Getting an autonomous vehicle to operate 90% of the time is straightforward. The next 9% takes longer. The 0.9% after that takes longer still.
The unknown unknowns — the scenarios no one on the team has thought to simulate — live in the decimals. They are the difference between a computer vision and pattern recognition (CVPR) demo and a production fleet that earns revenue without putting itself on the front page of a newspaper.
At our recent World Models Summit, technology leaders from Waymo, Wayve and PACCAR sat down to unpack how autonomous fleets across passenger transport, heavy trucking and end-to-end AI are systematically hunting down the long tail of edge cases to establish a minimum level of safety. The conversation featured three practitioners at the sharp end of that problem:
- Paul Konasewich, director of partnerships at PACCAR, bringing the heavy truck OEM perspective (Peterbilt, Kenworth, DAF).
- "Max" Chiyu Jiang, tech lead manager and research scientist at Waymo, leading onboard and offboard AI model development.
- Rudy Seville, OS, Kernel and Runtime Platform team lead at Wayve, pioneering end-to-end embodied AI software for global automotive manufacturers.
Their collective takeaway: simulation accelerates iteration. It does not replace real miles.
Telling a rare scenario apart from a truly novel one
The advent of frontier world models has supercharged simulation capabilities. Jiang from Waymo highlighted their organization’s use of generative simulators, built on top of world models like Google DeepMind’s Genie3. These let engineers generate almost any scenario from a prompt, including how a robotaxi reacts to a loose elephant wandering into the road, no zoo visit or rented animal required.
But the panel was firm that simulation is a supplement, not a substitute, for real mileage. The core limitation of simulation: If a scenario is an unknown unknown, you don’t know to simulate it in the first place. As Jiang put it, "If you don't know what you're going to simulate, how do you even simulate it?" The inspiration for what to test tends to come from the field, not the lab. Furthermore, simulated environments rarely capture the cascading hardware and system-level degradations that happen in the wild.
When tackling edge cases, engineering teams often struggle to balance foresight with real-world experience. Waymo addresses this by utilizing a dual-layered framework that splits engineering efforts into proactive system design and reactive field adaptation.
The first is proactive systems engineering, run by a dedicated team that enumerates possible failure cases and works through structured testing before deployment. Jiang traced the mindset to aerospace: "When you launch a rocket to Mars, you only get one shot. You don't have historical data to rely on, so you must enumerate every conceivable failure mechanism." For a first-of-its-kind mission you cannot lean on prior field experience, so you reason through failure modes in advance. Anticipate before you can observe.
The second direction is reactive field adaptation, which kicks in once the fleet is actually on the road. Here the system learns from real exposure. When a Waymo is aggressively cut off on the highway, the onboard system must process the interaction dynamically, sometimes executing pre-programmed maneuvers and other times utilizing learned behaviors from field experience to honk or brake safely. While proactive enumeration covers what you can foresee, reactive adaptation captures what only the world can show you.
Reactive scaling, however, requires exceptional caution. Code cannot be pushed to a massive fleet with a single click; instead software updates are throttled across small operational domains to prevent localized anomalies from propagating at scale. However, even a perfectly cautious software rollout only tests the driving model. It does not exercise the rest of the vehicle. Standard simulators concentrate on perception and planning, Seville noted, and tend to skip the physical system around them:
- What happens when the vehicle experiences a sudden global navigation satellite system (GNSS) or global positioning system (GPS) dropout in a dense urban canyon?
- How does the entire integrated system handle a localized hardware performance lag?
The underappreciated scaling bottlenecks
As autonomy scales toward millions of commercial rides per week, consumer perception is rapidly shifting from viewing the technology as magical to treating it like a routine trip in an elevator. Despite this rapid normalization, two massive scaling bottlenecks remain underappreciated by the general public:
- Hyper-localized behavioral adaptation
Autonomy stacks navigate regional human cultures. Seville noted that driving dynamics shift radically across borders. Furthermore, entering any new operational domain presents hyper-local behavioral anomalies. As the panel joked, mapping an autonomous fleet in Miami means the system must adapt to unpredictable events like "Spring Break" chaos. A vehicle must dynamically learn local driving etiquette, regional infrastructure configurations and unique pedestrian behaviors on the fly. Models optimized for the predictable lane markings of one city frequently face immediate operational friction when exposed to the cultural entropy of another.
- Physical operations and logistics
While high-level lane keeping and highway cruising are largely solved problems, the harder, less-discussed work is the physical choreography of deployment. As Konasewich noted, for heavy-duty freight trucks and passenger cars alike, "pickup and drop-off is harder than you think." Backing a heavy freight truck into a crowded loading dock, or placing a robotaxi precisely at a busy airport curb, demands localized, highly specific spatial reasoning that broad driving models still struggle to execute cleanly.
Building the data infrastructure for autonomous fleets
Solving the unknown unknowns requires massive volumes of highly diverse, precisely structured training data. To scale without drowning internal engineering teams in data management overhead, autonomy programs require robust pipelines that capture real-world entropy across different countries, weather conditions and unexpected human behaviors.
For example, when a leading multinational provider developing core components for global automakers needed to validate its next-generation ADAS systems, they partnered with us. The project demanded an immense volume of real-world sensor data, video feeds and vehicle diagnostics. Testing in controlled environments was insufficient; the client faced critical operational risks, including sensor drift, data gaps, complex real-world environmental noise and a complete lack of standardized protocols across multiple global stakeholders.
To address this, we delivered a fully managed, end-to-end FOT deployment. We took over complete responsibility for vehicle procurement, preparation, storage, maintenance and security. To eliminate sensor drift and data inconsistencies before they could corrupt the training loop, our technical teams directly installed and continuously calibrated the vehicles' complex onboard sensor equipment and lidar systems.
To maximize coverage of hyper-localized edge cases and regional driving habits, we orchestrated optimized routes spanning a total of 157,000 km across diverse driving conditions in over 34 major cities across South Korea, the European Union and the United States. Each test vehicle was staffed by a specialized two-driver operational team to ensure continuous, uninterrupted data acquisition. Cloud-based infrastructure handled real-time location tracking, vehicle diagnostics and 7 TB of data throughput per vehicle per day — with curated, annotated real-world driving scenarios feeding directly into the client's ADAS model training pipeline.
Fully-managed field operations like the above lets your core engineering teams focus on refining model architectures and optimizing physical hardware. Partner with TELUS Digital to build a flawless, data-aware foundation that allows your autonomous fleet to navigate the unstructured world safely.



