massive AI leap? but we can ask language models about how to drive through an intersection and it will list the steps correctly (and wildly incorrectly at other times)
what's missing is combining this kind of "human concept relations" model (language, rules, minimal reasoning, text encoded human preferences) with perception, and safety (which means that the model should know that if other cars are driving just fine in front then it's unlikely that the road is on fire, or that the low certainty crack in the road is okay if two other cars already went over it unimpeded, if the road marks and the signs are inconsistent, but other vehicles have formed a slow but consistent pattern of traffic then that's the local ruleset, and so on)
it's still a very hard problem. and the required amount of compute is still bonkers, the required amount of data and training is still absolutely huge, and the whole problem of safely disengaging, handling the asleep/drunk passengers (likely target audience after all)... are all hard problems too :)
what's missing is combining this kind of "human concept relations" model (language, rules, minimal reasoning, text encoded human preferences) with perception, and safety (which means that the model should know that if other cars are driving just fine in front then it's unlikely that the road is on fire, or that the low certainty crack in the road is okay if two other cars already went over it unimpeded, if the road marks and the signs are inconsistent, but other vehicles have formed a slow but consistent pattern of traffic then that's the local ruleset, and so on)
it's still a very hard problem. and the required amount of compute is still bonkers, the required amount of data and training is still absolutely huge, and the whole problem of safely disengaging, handling the asleep/drunk passengers (likely target audience after all)... are all hard problems too :)