Recent events in the robotaxi world have brought to greater light a key question for the development, deployment and regulation of this technology—how much risk should be tolerated, or even encouraged during their development? What level of problems are acceptable and what are not? How do developers decide how fast to go, and how do regulators align their decisions with the public interest?
Last Friday at 5:40pm, the California DMV issued a “request” to GM’s Cruise robotaxi unit to scale back their operations in San Francisco to less than half of what they were — this came just a week after they had received a permit from the California PUC to expand operations to the daytime and start charging fares. Waymo, which also received an expanded permit, was not told to scale back. This came the day after a Cruise vehicle was hit by a San Francisco Fire Truck advancing through a red light, and another vehicle was T-boned by a red-light runner. There were several other incidents earlier in the week which also made the news — it was easily the worst week for Cruise in its history.
Many of those making comment to the CPUC and in other fora, including of course the City of San Francisco, expressed the view that these vehicles should scale back, and that they were “not ready” for various reasons. Even though the CPUC did not grant the city’s requests to deny the new permits, last week the city attorney filed another request to reverse that decision. All these comments opposed the deployment of both Waymo and Cruise.
A key question, rarely answered, is just what constitutes ready, and what level of incidents should be tolerated, and what level and type of incidents would demand a scale-back. The DMV “request” seems to suggest Cruise pushed too far, and Waymo has not, at least in the view of DMV, though the DMV has mostly said it wants to do investigation of the recent incidents, not that it has ruled they merited withdrawl of the DMV’s more important permits. Indeed, they have allowed Cruise to operate 50 vehicles by day and 150 at night, a 50% reduction (but an expansion of daytime operation from a week ago, when they were not permitted to give rides to the public at all.)
While the city has opposed all robotaxis, within the industry though, there has been much discussion of the difference between Cruise and Waymo. While there has been no formal survey, it seems a common consensus is that Waymo’s vehicles perform a fair bit better and generate fewer problems. The maps of problems published by the San Francisco officials show many more Cruise incidents. Cruise, however claims this is not true, and that the larger number of incidents with their vehicles are due to the fact they are driving far more miles in San Francisco than Waymo. (Waymo drives more of its miles in Phoenix.) Waymo points out most of their miles have been during the daytime, when traffic is much more complex, and Cruise only recently was given permission to do passenger service by day. Neither company has provided precise numbers on the difference between their operations.
My own anecdotal experience riding in the vehicles bears up a claim of superior quality for Waymo. While no single ride, or even score of rides, can form a judgement that a vehicle is safe—only statistics can do that—a single ride can evidence problems and any problems are a red flag. Other riders who have done many rides say the same.
Robotaxis are not, and never will be, perfect. They will make errors of varying severity and frequency. As such, deploying them on the public roada entails risk. Risk to their passengers but also to other road users. This risk is higher during their current trial phase compared to where it will be in the future when they are deployed. All teams have set a target that their future miles will entail less risk—much less risk—than typical human drivers, when the robotaxis are more mature.
The teams also claim that their vehicles already (after millions of miles of operation) have better safety records than the typical human driver. They put a particular focus on safety incidents where their vehicle has been at fault, which they report are very, very rare. Their figures for all safety incidents, including ones where others are at fault, are less impressive, and are higher by some calculations than those for human drivers, however, this is complicated by the fact that the majority incidents with regular drivers are never reported to anybody, while the robotaxis are noticing and reporting all incidents. At present, all incidents reported by Waymo are quite minor. Cruise has had some injury accidents, and they only state there have been “none with life threatening injuries.” This phrase has raised much suspicion, as it suggests some number with significant injuries that did not meet the standard of “life threatening.” Cruise has declined comment on this.
Some who comment on this situation appear to take a position demanding perfection, suggesting that single incidents were “unacceptable.” This stance would forever forbid robocars from the road and thus is not helpful. Rather, we must work to determine what level of risk is acceptable and what is not. Regulators, like the DMV and NHTSA (and to a much lesser extent the PUC) should attempt to firmly quantify this risk level.
Incidents are not only to be expected, they are the very purpose of pilot projects. Each incident is a problem found and normally fixed, not to be repeated, at least in quite the same way.
There are also many types of incidents. Most frightening are fatalities and serious injuries, which at present have not taken place. (Infamously, Uber ATG had a fatal collision that meant the end of their project, however, investigation by the NTSB and police placed blame on Uber’s internal staff management policies and the safety driver who ignored her job supervising the prototype vehicle she was operating. That the prototype had serious flaws was expected for a prototype at that level of development.)
Below this are:
- Incidents with minor injuries with no lasting harm.
- There are crashes with just property damage.
- Traffic disruptions with a safety risk, in particular those involving emergency vehicles.
- At a much lower scale of severity, there are ordinary traffic disruptions.
The Overwhelming Math
The robocar effort is a special class of technology. It promises to, when more mature, greatly reduce the risk on our roads of all these types of incidents, though most focus has been on serious injuries and crashes. Indeed, it can be argued it is guaranteed to do that, in that all the companies do not plan to scale their technology until they are confident that it drives much better than the typical human. If it isn’t safer, it won’t go. It is expected that regulators also would not allow it to scale if it doesn’t do this.
All driving creates risk. Every time you drive, you place yourself, and other unwitting people at some amount of mortal risk. This is not minor—driving is by far the riskiest thing most of us do, and the most risk we place on others. We tolerate incredible amounts of this risk, and often for minimal gain. It is well documented how speed creates significant risk on the roads, but almost all of us have done a lot of speeding simply to get to a social event a few minutes or seconds earlier. A lot of us do that almost every day. We also allow student drivers and freshly minted drivers on the road, in spite of their much higher documented level of risk. We do this in order to help them become better, more mature drivers.
Robocars promise to be a vaccine against this problem. Should they take over half our driving (as the companies would hope) and do it twice as well in each dimension, they would reduce all incidents, from traffic blockage to fatalities, by 25%. Some people aspire to these technologies driving 10 times better than the average human, and taking over a larger fraction (either by doing all the driving, or being integrated with human driven cars to prevent mistakes by drivers) and so drop things much more. This is no minor difference. It’s many millions of lives over not that many years, considered globally, though it will take some decades to make this happen.
Early robotaxi deployments are in tiny fleets—at present a few hundred cars. When mature, the vehicles will be deployed by the millions. Each that is a taxi will replace about 5 human drivers, while private cars may do just one.
It is important to understand this basic fact. The early fleets, which are immature, and which present higher risk levels are tiny, and will add only a tiny amount of risk to our streets. Even if they drove worse than humans (which the companies claim they don’t on safety, but may do on congestion) there are only a few hundred. When they are much better, there will be at least 10,000 and possibly a million times as many. 300 robot student drivers turn into a million superb ones. For we humans, each student driver who annoys us on the road turns into just one mature driver. Each immature robot will result in many thousands of mature ones. When one robot makes a mistake, and it gets fixed, all the robots learn and don’t repeat that mistake.
It’s a reasonable assumption that if we delay the testing, including live testing on our streets, we delay the deployment, the maturity and saturation. The delay in saturation won’t be exactly the same as the delay in saturation but it will be closely linked. You are unlikely to delay testing a year and not delay the rest a similar amount. It turns out that a day of a year in saturation results in as many extra incidents as the better technology will prevent in a year once saturated.
To make that clear: If you delay things a year, and at the end it’s preventing 1 million incidents, such as crashes per year, then there will be a million extra crashes because of the delay. You won’t directly cause the crashes — rather they will be caused by human drivers who were delayed in switching over to the safer technology. The drunks who didn’t take a robotaxi home are culpable, but those who pushed a delay deliberately took the safer option away from them.
This makes the situation stark for the San Francisco fire chief. She’s worried about 55 times this year that there was a bad interaction between robotaxis and her fire crews. If she wins her delay, this will mean thousands of delays and blockages with ordinary drivers in the years to come that could have been prevented, and millions around the world. She is trading hundreds today for millions in the future, perhaps because she’s inure to all the many ways ordinary drivers slow fire trucks every day and hasn’t yet been shown that something can be done about that.
One might wonder; can’t this math justify almost any amount of risk today, including crazy amounts? In theory it might. We humans, as it turns out, are not utilitarians who simply add up the math. We sometimes ask our governments to do that, to focus on the good of society over the individual, but we’re not very good at that, and we’re really unable to do it when talking about the really serious events.
That’s why, when it comes to serious events with irreparable harm, like serious injuries, teams set a much higher bar before they took the safety drivers out. They have worked to do better than the risk levels of average drivers, even though the math says one should start earlier. They claim they have done this. Fortunately, they don’t have to do a mathematical proof of that — this is about risk, not individual events, so all they need show is that while there is a risk they are wrong, that risk is also small.
As individuals, we react strongly to events, but as a society, we want to react to statistics. We won’t ever get very statistical about serious injury accidents, but we can and should get that way about things like traffic disruptions and property damage. Estimates suggest each American driver spends around 50 hours/year stuck in traffic and that number is getting worse. While solutions to congestion are more complex and speculative, the potential is strong. That 50 hours/person maps to over one million person years of 16 hour days wasted every year or 16,000 human lifetimes. Even a dent in that has considerable value too. Would we stop a saving of millions of hours of delay in the future to prevent a handful of hours today? It seems we might.
Fire delays (and interference at fire scenes) are a more challenging problem. The fire department, at the CPUC hearings, reminded the commission that in the wrong circumstances, a fire can grow very quickly in a minute or two, and as such a few minutes delay can have extreme results. At the same time, every emergency vehicle encounters a few minutes of delay on our crowded streets — human drivers are far from perfect at yielding as they should and crowded streets may simply not have a path. Those minutes of delay don’t automatically end in tragedy, but they do create a risk of it. Once again, the question is a trade-off of minor additional risk today for massive risk reduction in the future.
Standards of risk
These principles suggest the following standard for deciding whether a robotaxi pilot has crossed the line and should be slowed:
- When it comes to safety incidents that could cause irreparable harm, the team should make a solid case that they are probably (not certainly) no worse than the typical human driver, which means their deployment makes no more risk on the road than a similar deployment of human driven cars. Incidents may well happen but must be viewed in this context. There is an argument that the risk level might match a student driver or newly licensed driver, since we seem to readily tolerate that risk today.
- For reparable harms, such as property damage and minor injuries, society should tolerate a fair bit more risk, as long as the companies have means to make good any harms, and this liability aligns their actions with the public good. It may be considered that remedies paid out to any victims might, for a limited time, be double those in accidents caused by human drivers.
- For inconveniences, such as traffic delays and minor mistakes, these should be given a great deal of tolerance, even if they far exceed those caused by human drivers, as long as it’s clear that they are on a path of improvement. In particular, tolerance should be great for traffic delays resulting from the vehicle being extra conservative in order to reduce its risk of safety incidents.
A team needs to be credible — there must be a decent case that they will, in the future, produce vehicles which surpass human safety levels to deliver this future dividend. At present, most teams have invested very large amounts and their success depends on pulling this off, so their financial interests are aligned with the public interest. As long as those interests can be aligned, good results are more likely.
Perception, Reality And Alternatives
Even though the mathematics make a solid case for being tolerant of early risk in exchange for massive risk reduction, this debate is not happening among insurance actuaries. Even if officials, doing their job to promote long term road safety, express this tolerance, the public may not. Things can blow up on social media and in the press and create anecdotal perceptions very different from statistical reality.
In addition, as we’ve seen, even if Cruise isn’t pushing too hard, the definition of too hard is “attract the attention of the DMV,” which they have done. Though the DMV has, in calling for Cruise to scale back for a short time while they investigate further, followed a good path which will not delay Cruise much if they find their performance acceptable. Indeed, even one who takes a purely utilitarian “greatest good for greatest number” approach to this problem would realize that angering the public can stall a project, erasing that potential for greater good.
Some have suggested these problems should have been found in simulation or with safety drivers. Both companies drove billions of miles in simulation and millions with safety drivers, and it’s clear that many of these problems would not become so apparent until the vehicle has to drive entirely on its own. At some point this transition must be made, both because it is needed for production, and it limits the speed of scaling and makes costs untenable if delayed too long.
My own evaluation is that Cruise is indeed underperforming Waymo. Of concern to me have been a number of incidents that appear to fit in a category I might call “that should not happen with pilot-level project.” Incidents will happen, but you want them to be the sort of incidents that you do a pilot project to discover. Each mistake made by these vehicles is generally a good thing—a problem discovered that can now be fixed and won’t be repeated, at least specifically. Some of Cruise’s incidents have raised concern with me, and may have with the DMV.
- The crash into the back of the bus should not be able to happen. While the bug which caused it was obscure and is fixed, it should not be possible for almost any bug to drive a vehicle into a large obstacle like that. It raises concerns that it could drive into another obvious obstacle due to a different bug, as there will be more bugs.
- Cruise has not revealed the reason for their drive into wet concrete. This construction zone was well marked and known. I would want to know this error does not signal a pattern.
- The going through caution tape and hitting downed wires indicates they did insufficient simulation and test-track testing of situations like this. There will be errors in situations that are so rare that nobody thought to test them. This is not that rare a situation.
- While it is not clear it affected the result, Cruise’s admission that their prediction engine was confused by a fire truck using the opposing traffic lane suggests a surprising omission.
- Failure of communications should have been anticipated and tested with a better fail-safe.
- For both teams, it is not clear why so many problem incidents are being resolved by rescue drivers rather than remote operations. Cruise had 177 such events in 2023 over 2.1M miles. That’s not particularly frequent but 15 minute stalls don’t correspond to any common human bad behavior, not at a rate of once per 3 years of human driving.
Cruise has not been transparent about these things and declined to answer my questions about them. It seems they may get questions now from the DMV that they can’t decline. The natural instinct of these companies to not be too transparent is against their interests here—the reality is that keeping the public informed and confident in these vehicles is necessary to not generate the negative attention which leads to investigations like this.
Waymo has done much better, but also far from perfectly, on transparency. It has been more transparent about its Phoenix operations.
Even so, Cruise now reports over three million miles of no-safety-driver operation in San Francisco, most of it at night. There is not yet first quantities on their numbers of traffic disruptions, minor injuries and significant injuries — they report zero life threatening injuries. Injury accidents in the USA happen about every million miles, but that’s a number for both night and day and for both city and highway — it’s worse on city streets, and slightly worse at night (though numbers for Cruise’s late night hours are not readily available.)