Tesla AI Day 2021

(T) Tesla had this week a recruiting event for data scientists and software engineers: “Tesla AI Day”. In my opinion, Andrej Karpathy and his team during that event have shown the most advanced machine learning system that has probably never been publicly demonstrated. I do not believe that Waymo gave as much insight into its technology as Telsa did this week (for more on Waymo’s machine learning models – read Machine Learning Models for Autonomous Vehicles).

The event had four major themes:

  • The Autopilot: the architecture of the various deep learning models for vision, planning and control
  • Training data: manual and auto labeling, and simulations for edge cases
  • Project Dojo and D1 chips: The replacement of nVIDIA GPUs by a Telsa chip for training the deep learning models
  • Tesla bot: the autonomous humanoid robot based on autopilot technology showcased by Elon Musk

MIT Professor Lex Fridman has developed a video that highlights the key innovations. My suggestions is to view Professor Fridman’s video a couple of times, and after that watch the Telsa’s video from the event to deeper dive into those innovations which are quite complex. Following is my “humble” attempt to summarize them:


  • HydraNets models, e.g. the Autopilot deep learning model for each camera, operate in the vector space not in the image space e.g. 2D video data is transformed into 3D vectors
  • Data from all the cameras and sensors on the vehicle are aggregated before any task on vision starts
  • Fusion of space and time: video context not only include the vector space but also time
  • Additionally, a feature queue gives some kind of memory to the system and provides past data so that the car can make informed decisions afterward

Planning and control

  • Safety, comfort, and efficiency are maximized first
  • Combine two systems: explicit planning and learning-based planning
  • Note that the space is non-convex — the system can get stuck into a local minimum that solves a specific situation but is not valid globally – and highly-dimensional, the system needs to process many parameters (trajectory, acceleration…) to plan what to do next
  • Explicit planning: the system first searches for the best set of trajectories, called the convex corridor, using physical models; then it finds a definitive solution using continuous optimization methods
  • Learning-based planning for complex situations – not yet implemented in Autopilot – similar concept to DeepMind technique used in Atari games and MuZero – a deep learning system provides some context to a Monte-Carlo Tree Search (MCTS) algorithm

Training data

  • Physical inputs: 8 cameras and sensors data; no lidar and no high definition maps
  • 4D labeling (physical + time vector space), manual labeling (done by over 1,000 people internally), and auto labeling
  • Label weather, times, angles conditions from many different cars that have driven at the same location over times
  • Simulations of 3D world for edge cases and complex scenes

Project Dojo and the D1 Chip

  • Tesla training pipeline uses presently 720 nodes of 8x Nvidia A100 Tensor Core GPUs (5,760 GPUs total) for up to 1.8 exaflops of performance
  • The D1 chip is designed with a 7-nanometer manufacturing process providing 362 teraflops of processing power
  • Tesla will use 25 of these chips onto a single “training tile,” and 120 of these tiles together will provide up to an exaflop of processing power

Professor Lex Fridman Highlights

Tesla Complete Event


  • Andrej’s workshop at CVPR 2021:

Note: The picture above is the complete architecture of Tesla’s end-to-end deep learning system.

Copyright © 2005-2021 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com