Tesla

Tesla’s acquisition of DeepScale starts to pay off with new IP in machine learning

Fred Lambert | Apr 17 2020 - 9:15 am PT

Tesla’s acquisition of machine-learning startup DeepScale is starting to pay off, with the team hired through the acquisition starting to deliver new IP for the automaker.

Late last year, it was revealed that Tesla acquired DeepScale, a Bay Area-based startup that focuses on “Deep Neural Network (DNN)” for self-driving vehicles, for an undisclosed amount.

They specialized in computing power-efficient deep learning systems, which is also an area of focus for Tesla, who decided to design its own computer chip to power its self-driving software.

There was speculation that Tesla acquired the small startup team in order to accelerate its machine learning development.

Now we are seeing some of that team’s work, thanks to a new patent application.

Just days after Tesla acquired the startup in October 2019, the automaker applied for a new patent with three members of DeepScale listed as inventors: Matthew Cooper, Paras Jain, and Harsimran Singh Sidhu.

The patent application called “Systems and Methods for Training Machine Models with Augmented Data” was published yesterday.

Tesla writes about it in the application:

Systems and methods for training machine models with augmented data. An example method includes identifying a set of images captured by a set of cameras while affixed to one or more image collection systems. For each image in the set of images, a training output for the image is identified. For one or more images in the set of images, an augmented image for a set of augmented images is generated. Generating an augmented image includes modifying the image with an image manipulation function that maintains camera properties of the image. The augmented training image is associated with the training output of the image. A set of parameters of the predictive computer model are trained to predict the training output based on an image training set including the images and the set of augmented images.

The system that the DeepScale team, now working under Tesla, is trying to patent here is related to training a neural net using data from several different sensors observing scenes, like the eight cameras in Tesla’s Autopilot sensor array.

They write about the difficulties of such a situation in the patent application:

In typical machine learning applications, data may be augmented in various ways to avoid overfitting the model to the characteristics of the capture equipment used to obtain the training data. For example, in typical sets of images used for training computer models, the images may represent objects captured with many different capture environments having varying sensor characteristics with respect to the objects being captured. For example, such images may be captured by various sensor characteristics, such as various scales (e.g., significantly different distances within the image), with various focal lengths, by various lens types, with various pre- or post-processing, different software environments, sensor array hardware, and so forth. These sensors may also differ with respect to different extrinsic parameters, such as the position and orientation of the imaging sensors with respect to the environment as the image is captured. All of these different types of sensor characteristics can cause the captured images to present differently and variously throughout the different images in the image set and make it more difficult to properly train a computer model.

Here they summarize their solution to the problem:

One embodiment is a method for training a set of parameters of a predictive computer model. This embodiment may include: identifying a set of images captured by a set of cameras while affixed to one or more image collection systems; for each image in the set of images, identifying a training output for the image; for one or more images in the set of images, generating an augmented image for a set of augmented images by: generating an augmented image for a set of augmented images by modifying the image with an image manipulation function that maintains camera properties of the image, and associating the augmented training image with the training output of the image; training the set of parameters of the predictive computer model to predict the training output based on an image training set including the images and the set of augmented images.

An additional embodiment may include a system having one or more processors and non-transitory computer storage media storing instructions that when executed by the one or more processors, cause the processors to perform operations comprising: identifying a set of images captured by a set of cameras while affixed to one or more image collection systems; for each image in the set of images, identifying a training output for the image; for one or more images in the set of images, generating an augmented image for a set of augmented images by: generating an augmented image for a set of augmented images by modifying the image with an image manipulation function that maintains camera properties of the image, and associating the augmented training image with the training output of the image; training the set of parameters of the predictive computer model to predict the training output based on an image training set including the images and the set of augmented images.

Another embodiment may include a non-transitory computer-readable medium having instructions for execution by a processor, the instructions when executed by the processor causing the processor to: identify a set of images captured by a set of cameras while affixed to one or more image collection systems; for each image in the set of images, identify a training output for the image; for one or more images in the set of images, generate an augmented image for a set of augmented images by: generate an augmented image for a set of augmented images by modifying the image with an image manipulation function that maintains camera properties of the image, and associate the augmented training image with the training output of the image; train the computer model to learn to predict the training output based on an image training set including the images and the set of augmented images.

As we previously reported, Tesla is going through “a significant foundational rewrite in the Tesla Autopilot.” As part of the rewrite, CEO Elon Musk says that the “neural net is absorbing more and more of the problem.”

It will also include a more in-depth labeling system.

Musk described 3D labeling as a game-changer:

It’s where the car goes into a scene with eight cameras, and kind of paint a path, and then you can label that path in 3D.

This new way to train machine learning systems with multiple cameras, like Tesla’s Autopilot, with augmented data could be part of this new Autopilot update.

Here are some drawings from the patent application: