Artificial Intelligence: How Reinforcement Learning Enables a New Generation of Robots?
(T) I had the pleasure to listen to several great talks about reinforcement learning from Professor Pieter Abbeel from UC Berkeley at the ODSC West Conference last Fall.
I am following in particular Professor Abbeel’s research in deep reinforcement learning, multi-task learning, and meta-learning.
Professor Abbeel started Covariant with a few colleagues from UC Berkeley.
Following is a quick description of the underlying robotic technology for Covariant, and a video about its partnership with ABB…
“Covariant’s approach, which uses a single deep learning system for all objects, enables an arm equipped with a camera and suction gripper to manipulate around 10,000 different items (and counting). The system can share skills with other arms, including those made by other companies.
Training starts with attempts at a few-shot adaptation. In many cases, the robot can learn from a limited number of attempts. For more intensive training, an engineer wearing virtual reality gear uses hand-tracking hardware to control the arm in a simulated environment. The model learns to mimic the motion.
The model stores basic movements then hones them using reinforcement learning in a variety of simulated situations. The team then uses behavioral cloning to transfer the robot’s learned skills into the real world.”
Covariant leverages key research in reinforcement learning that Professor Abbeel started as a Ph.D. student of Professor Andrew Ng at Stanford University:
“We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using “inverse reinforcement learning” to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations and that even though we may never recover the expert’s reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert’s unknown reward function.”
Research that he has pursued on meta-learning:
“Learning quickly is a hallmark of human intelligence, whether it involves recognizing objects from a few examples or quickly learning new skills after just minutes of experience. Our artificial agents should be able to do the same, learning and adapting quickly from only a few examples, and continuing to adapt as more data becomes available. This kind of fast and flexible learning is challenging since the agent must integrate its prior experience with a small amount of new information, while avoiding overfit- ting to the new data. Furthermore, the form of prior experience and new data will depend on the task. As such, for the greatest applicability, the mechanism for learning to learn (or meta-learning) should be general to the task and the form of computation required to complete the task.
In this work, we propose a meta-learning algorithm that is general and model-agnostic, in the sense that it can be directly applied to any learning problem and model that is trained with a gradient descent procedure. Our focus is on deep neural network models, but we illustrate how our approach can easily handle different architectures and different problem settings, including classification, regression, and policy gradient reinforcement learning, with minimal modification. In meta-learning, the goal of the trained model is to quickly learn a new task from a small amount of new data, and the model is trained by the meta-learner to be able to learn on a large number of different tasks. The key idea underlying our method is to train the model’s initial parameters such that the model has maximal performance on a new task after the parameters have been updated through one or more gradient steps computed with a small amount of data from that new task.”
Note: The picture above is from Covariant.
Copyright © 2005-2020 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.