(T) I attended last month a presentation from Ilya Susdskever from Google Research about Deep Leaning, part of the SF Big Analytics meet-up in San Francisco. Ilya, along with Alex Krizhevsky and Professor Geoffrey Hinton from Toronto University made some major contributions to both the academia and the industry in popularizing deep learning. All three work now for Google.
Following are my notes from the lecture and below the video of the lecture:
Defining deep learning
-
The modern reincarnation of Artificial Neural Networks from the 1980s and 90s
-
A collection of simple trainable mathematical units called neurons, which collaborate to compute a highly-complex function
Learning algorithm
-
You pick a random training case for an input layer x = {x1, x2,…xn} and output layer y = {y1, y2,…yn}. The initial input layer is the data; the final output is the result.
-
The processing units between the initial inputs and the final outputs are the neurons and called the hidden layers.
-
For each neuron: y= F(∑wixi); wi are the weights/coefficients between the inputs xi and the outputs yi for each layer:
-
You activate the neural nets and optimize the connections for each layer until y is close to the desired result
-
The optimization is the result of minimizing the cost function using gradient descent between the predicted and excepted output
Useful Neural Nets applications
Modest-sized neural nets with two hidden layers can sort N N-bits numbers while boolean circuits cannot. Neural nets can:
- Recognize objects
-
Recognize speech
-
Recognize emotion
-
Instantly see how to solve some problems
Recent Deep learning research @ Google
- How Google Translate squeezes deep learning onto a phone
- DeepDream – a code example for visualizing Neural Networks
- Inceptionism: Going Deeper into Neural Networks
- Beyond Short Snippets: Deep Networks for Video Classification
- From Pixels to Actions: Human-level control through Deep Reinforcement Learning
Object recognition with convolutional layers
Convolutional neural nets consist of multiple layers of small neuron collections which look at small portions of the input image, called receptive fields. The results of these collections are then tiled so that they overlap to obtain a better representation of the original image:
Learning with sparse input data
How? The raw sparse inputs are given to an embedding function that delivers floating-point vectors to the neural nets
Example: source word/nearby words – new_york/new_york_city, brooklyn, long_isalnd, syracuse, manhattan, bronx…
Sequence prediction
Deep recurrent neural network (LSTM)
Example: Hello how are you? Bonjour comment allez-vous?
Example: I cannot connect to the VPN – Neural Net: When did you connect for the last time to the VPN…?
Combining modalities e.g. vision and language
Example: given a photograph, automatically generate a text caption
Reinforcement Learning
Playing Games with Reinforcement Learning
Attention models
Concept: extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution
Hard attention: the decision to where to look
Soft attention: using a differentiable approximation
Huge potential for many applications:
-
State of the art machine translation
-
State of the art syntactic parsing
-
Soon-to-be state of the art for speech recognition and for visual recognition and detection
Reference: A Silicon Valley Insider, Deep Dive into Deep Learning
Note: The picture above is from the talk.
Copyright © 2005-2015 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.
Categories: Artificial Intelligence, Deep Learning