
(T) Research in deep learning is moving rapidly in every direction: reinforcement and imitation earning for robots, computer vision for self driving cars, graph neural networks for biology, and language models for everything. Until we reach a point where we can predict when and why deep learning models work, we will never be able to design truly artificial intelligent systems. For the last few years, theoretical deep learning has emerged as a new field that aims to explain the behaviors of certain types of deep learning models. It is like theoretical computer science for computer science or theoretical physics for physics, and expanding as more mathematicians are joining the field.
One particular area in theoretical deep learning that got a lot of attention is the behavior of deep learning systems in the infinite width bandwidth, in particular because of the work of Professor Arthur Jacob on Neural Tangent Kernel (NTK):
- “The training with infinitesimal gradient descent steps of a randomly initialized deep learning model with infinite width is equivalent to a kernel regression which Arthur called a neural tangent kernel (NTK)”
Following is a generalization of that initial work that aims to organize the present research of theoretical deep learning in a few themes.
Neural Networks Regimes:
Let’s define a neural network with the following parameters:
- n = Sample size of the data = number of neurons
- N = Number of neurons
- d = Dimensions (number of inputs to the network)
- k = SGD steps
As those parameters are changing, the network can enter different regimes e.g. behaviors. There are three types of regimes that have been studied over the last few years. To describe them, I am using the materials from a lecture of Professor Montari from Stanford University:
- Regime 1 – small networks = few neurons, the number of data points is equal or larger than the dimensions, the number of steps is much bigger than the number of data points
- Key concept: statistical physics of disordered systems (spin glass techniques)
- Regime 2 – over-parametrized regime = over-parametrized network, very few SGD steps
- Key concept: neural tangent kernel
- Regime 3 – mean field regime = a lot of neurons but not a lot of steps, each point during the SGD is visited basically once
- Key concept: mean field approach
Regime 1:
Computational Gaps in Learning:
- B. Aubin, et al – “The committee machine: Computational to statistical gaps in learning a two-layers neural network”
- In a few slides: “The committe machine“
- Video: “The committe Machine“
Regime 2:
Neural Tangent Kernel (NTK):
- A. Jacot, F. Gabriel, C. Hongler, “Neural Tangent Kernel: Convergence and Generalization in Neural Networks.”NeuriPS 2018
- In a few slides: “Neural Tangent Kernel“
- Videos:
- Understanding the maths of NTK:
- A few notes on wide neural networks, NTK, and gaussian processes:
Gaussian Processes:
- Lee & Bahri et al, “Deep Neural Networks as Gaussian Processes.” ICLR 2018.
- Lee & Xiao, et al, “Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.” NeuriPS 2019.
- Arora, et al, “On Exact Computation with an Infinitely Wide Neural Net.” NeurIPS 2019.
Lazy Training:
- L. Chizat, E. Oyallon, F. Bach, “On Lazy Training in Differentiable Programming” NeuriPS 2019
Double Descent Curve:
- S Mei, A Montanari “The generalization error of random features regression: Precise asymptotics and double descent curve”
- A tutorial on “Deep Double Descent“
Over-Parametrized Regime:
- Geiger & Jacot, et al. “Scaling description of generalization with number of parameters in deep learning”
Reverse Engineering the Neural Tangent Kernel:
- J. Simon, S. Anand, M. DeWeese, “Reverse Engineering the Neural Tangent Kernel”
- Blog article: “Reverse engineering the NTK“
Regime 3:
Mean Field Theory Applied to Neural Networks:
- S Mei, A Montanari, PM Nguyen “A Mean Field View of the Landscape of Two-Layers Neural Networks”
- PM Nguyen “Mean field limit of the learning dynamics of multilayer neural networks”
- PM Nguyen, H Pham “A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks”
Reference:
- The blog and the book of Francis Bach have some excellent materials to study maths in machine learning.
Note: The picture above is Amélie-les-Bains in the French Pyrénées-Orientales.
Copyright © 2005-2022 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com
Categories: Algorithms, Artificial Intelligence, Deep Learning