Google Research in Deep Learning


(T) I attended last month a presentation from Ilya Susdskever from Google Research about Deep Leaning, part of the SF Big Analytics meet-up in San Francisco. Ilya, along with Alex Krizhevsky and Professor Geoffrey Hinton from Toronto University made some major contributions to both the academia and the industry in popularizing deep learning. All three work now for Google.

Following are my notes from the lecture and below the video of the lecture:

Defining deep learning

  • The modern reincarnation of Artificial Neural Networks from the 1980s and 90s

  • A collection of simple trainable mathematical units called neurons, which collaborate to compute a highly-complex function


Learning algorithm

  • You pick a random training case for an input layer x = {x1, x2,…xn} and output layer y = {y1, y2,…yn}. The initial input layer is the data; the final output is the result.

  • The processing units between the initial inputs and the final outputs are the neurons and called the hidden layers.

  • For each neuron: y= F(∑wixi); wi are the weights/coefficients between the inputs xi and the outputs yi for each layer:


  • You activate the neural nets and optimize the connections for each layer until y is close to the desired result

  • The optimization is the result of minimizing the cost function using gradient descent between the predicted and excepted output

Useful Neural Nets applications

Modest-sized neural nets with two hidden layers can sort N N-bits numbers while boolean circuits cannot. Neural nets can:

  • Recognize objects
  • Recognize speech

  • Recognize emotion

  • Instantly see how to solve some problems

Recent Deep learning research @ Google

Object recognition with convolutional layers

Convolutional neural nets consist of multiple layers of small neuron collections which look at small portions of the input image, called receptive fields. The results of these collections are then tiled so that they overlap to obtain a better representation of the original image:


Learning with sparse input data

How? The raw sparse inputs are given to an embedding function that delivers floating-point vectors to the neural nets

Example: source word/nearby words – new_york/new_york_city, brooklyn, long_isalnd, syracuse, manhattan, bronx…

Sequence prediction

Deep recurrent neural network (LSTM)

Example: Hello how are you? Bonjour comment allez-vous?

Example: I cannot connect to the VPN – Neural Net: When did you connect for the last time to the VPN…?

Combining modalities e.g. vision and language

Example: given a photograph, automatically generate a text caption

Reinforcement Learning

Playing Games with Reinforcement Learning

Attention models

Concept: extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution

Hard attention: the decision to where to look

Soft attention: using a differentiable approximation

Huge potential for many applications:

  • State of the art machine translation

  • State of the art syntactic parsing

  • Soon-to-be state of the art for speech recognition and for visual recognition and detection

Reference: A Silicon Valley Insider, Deep Dive into Deep Learning

Note: The picture above is from the talk.

Copyright © 2005-2015 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.