Tesla’s head of AI admitted that the automaker’s approach to self-driving is harder than what most companies in the industry are doing, but he says it’s the only way to scale.
There are dozens of high-profile companies working on solving self-driving and virtually as many different approaches, but there are two main differences: those who rely mainly if not entirely on computer vision, and those who rely on HD mapping.
Tesla falls in the former category of relying on computer vision.
Andrej Karpathy, Tesla’s head of AI and computer vision, is leading this effort.
Earlier this week, he participated in a CVPR’20 workshop on “Scalability in Autonomous Driving” during which he gave an update on the status of Tesla’s program and talked about the scalability challenges:
During the presentation, Karpathy shared a video of Tesla’s self-driving development software demonstration doing a turn and then Waymo’s self-driving prototype doing the same.
He highlighted how it looks exactly the same, but the decision making that is powering the maneuver is completely different:
Waymo and many others in the industry use high-definition maps. You have to first drive some car that pre-maps the environment, you have to have lidar with centimeter-level accuracy, and you are on rails. You know exactly how you are going to turn in an intersection, you know exactly which traffic lights are relevant to you, you where they are positioned and everything. We do not make these assumptions. For us, every single intersection we come up to, we see it for the first time. Everything has to be sold — just like what a human would do in the same situation.
Karpathy admits that this is a hard problem to solve.
However, the engineer explains that Tesla aims for a scalable self-driving system deployable in millions of cars on the road, and he argues that Tesla’s vision-based system is easier to scale:
Speaking of scalability, this is a much harder problem to solve, but when we do essentially solve this problem, there’s a possibility to beam this down to again millions of cars on the road. Whereas building out these lidar maps on the scale that we operate in with the sensing that it does require would be extremely expensive. And you can’t just build it, you have to maintain it and the change detection of this is extremely difficult.
The engineer described the map-based approach as a “non-scalable approach.”
He did say that Tesla also builds maps and use “all kinds of fusions between vision and the maps,” but their maps are not centimeter-level accurate and therefore, they can’t rely on them to navigate.
Tesla has to be able to handle any situation like it is seeing it for the first time.
Karpathy explains how they accomplish that with only “a few dozen people” working on neural networks.
Everything is built around a general computer vision infrastructure around which they, in turn, create new tasks. While only a few dozens work on neural networks, they have a “huge” team working on labeling.
In other words, they separate the core vision detection system and the separate tasks that the system needs to achieve, like detect all types of stop signs.
The engineer had some words for the competition relying on maps:
Do not assume that we can get away as an industry with HD lidar maps for global deployment of these features. I would take lidar maps, and especially the flow of all the lanes, traffic, and so on, and think about how you can predict an intersection without assuming lidar maps.
You can watch the full presentation via the link above.
I like that Karpathy doesn’t shy away from admitting that it is a harder problem to solve than what Waymo and most other companies in the field are attempting.
It’s about the data. Tesla is building large datasets and curating them so that they only feed good data to their neural network.
It was especially interesting to understand how Tesla is collecting the data. Karpathy gave the example of stop signs covered by foliage. They collected a few images and then built a classifier to have the fleet of customer vehicles look for stop signs covered by foliage, built the dataset, and used it to train its neural network.
Tesla has the giant advantage of collecting a ton of data from hundreds of thousands of cars, but he describes finding the right data as finding a needle in a haystack. Tesla’s scalability problem is finding those needles, which represent useful scenarios encountered by Tesla vehicles, and using them in their datasets to train the neural net.
I think Tesla owners could help make this happen.
They can already send Tesla timestamped feedback from their vehicles. It would be cool to have some kind of gamification to help Tesla with finding the right data.
To use Karpathy’s example, Tesla could have an in-car app with a challenge of the week: “Report stop signs blocked by foliage.”
When Tesla owners drive around and see a stop sign blocked by foliage, they make a voice command that sends the data to Tesla and you gather points. Based on those points, Tesla could give out free Supercharging miles or something like that.
The Autopilot team could update those challenges depending on which specific problem they are working on.
What do you think? Let us know in the comment section below.
FTC: We use income earning auto affiliate links. More.
Subscribe to Electrek on YouTube for exclusive videos and subscribe to the podcast.