Performance is Predictable: Waymo Investigates Scaling Laws in Autonomous Driving

More data, more computing power, better results – what has long been considered a rule in AI research and may sound a bit like “Captain Obvious” was, until now, not equally well documented in the development of autonomous driving systems. In a recent blog post, Waymo – Alphabet’s subsidiary specializing in robotaxis – has for the first time published detailed findings showing that scaling effects can also be systematically observed in autonomous driving, both during training and in real-world deployment.
Waymo bases its analysis on an extensive dataset from real-world driving: more than 500,000 hours of autonomous operation went into the study – a volume far exceeding typical academic research. The study aimed to determine how the performance of AI systems changes when either the amount of data, computing power, or model size is increased.
In this article
Three areas in focus: Training, data volume, and real-time processing
Three key levers were examined: first, the computing power used for training the AI (“Train Compute”), second, the size of the training dataset, and third, the available computing power for real-time processing inside the vehicle (“Inference Compute”). Two core functions were at the center of this: the ability to accurately predict the movement of other road users (“Motion Forecasting”) and the vehicle’s planning behavior (“Planning”) based on those predictions.
The results indicate a consistently positive scaling effect. According to Waymo, clear relationships between resource investment and model performance could be demonstrated for all three parameters – in the form of so-called “power laws,” i.e., mathematically predictable performance curves. With every doubling of data volume or computing power, the key capabilities of the models improve measurably.
Predictability as a key finding
While the notion that more data and more computing power lead to better results may not be surprising in itself, the core of this study is the predictability of these improvements. Waymo particularly emphasizes that the performance gains do not occur randomly or erratically, but follow consistent, predictable patterns. This allows for more accurate forecasting of technological progress in autonomous systems. If, for instance, it is known how much additional compute is needed to reduce a certain error rate, developers, project leads, and companies can allocate resources more effectively.
In the blog post, Waymo speaks of a “roadmap for model development” that could be derived from these insights. The phrasing suggests that the company is moving toward an even more data-intensive development strategy – one that demands significant investments in hardware, infrastructure, and data collection, but could, according to Waymo, lead to significantly improved performance.
More onboard computing power improves results
Another key result relates to the so-called inference phase – the real-time execution of the AI in the operating vehicle. Here too, Waymo reports that increased computing power can have a direct impact on model performance. Smaller models deployed inside the car can be significantly enhanced using techniques like sampling or clustering – provided that sufficient computing resources are available.
This is likely to be particularly relevant for automakers and fleet operators, who have so far often made hardware trade-offs in favor of cost. These decisions could now be better tailored to actual operational needs.
Motion forecasting requires building robust models that account for the myriad of edge cases that can happen on public roads. This is a highly complex task given the inherent uncertainty in predicting the behavior of other road users. — Waymo Blog
Scaling laws in autonomous driving: not a given
In classical fields of AI – such as language models or image recognition – scaling laws have been proven for years. However, their transferability to other areas of AI had not been established. Autonomous driving poses several unique challenges: data comes not from uniform sources but from a complex interplay of cameras, radar, and lidar. Road situations vary greatly, are rarely repeatable, and are safety-critical – meaning that mistakes have direct, real-world consequences.
Waymo itself emphasizes that it was far from certain that these known scaling effects would apply in this context. The fact that such consistent patterns emerged is considered an encouraging sign – especially since the effects could be observed across different model architectures.
Impact on the industry remains to be seen
This publication should be understood less as a technical breakthrough in the traditional sense and more as a methodological milestone: developers now have a clearer understanding of how to systematically improve their systems through targeted scaling. Whether this will lead to a widespread shift in development strategies across the industry remains to be seen.