The Interface of Reinforcement Learning and Planning

Abstract

In reinforcement learning (RL), an agent learns to solve sequential decision making problems through trial and error. When combined with deep networks, RL has the potential to learn domain representations that are important for making good decisions, such as patterns in board games (Go, Chess), or image features in video games. Based on this observation, there is recent interest in applying RL to more general planning problems, such as combinatorial optimization, autonomous driving, or protein folding, where the key idea is that when encountering many similar problems, some structure of the solution could be learned by RL to speed up the planning computation. In this talk, I will discuss the algorithmic interface between planning and RL, focusing on both the representation architecture and the learning algorithms.

We start by asking about the capability of deep networks to learn planning computations, and introduce value iteration networks, a type of differentiable planner that can be used within model-free RL to obtain better generalization. Next, we consider a practical robotic assembly problem, and show that motion planning, based on readily available CAD data, can be combined with RL to quickly learn policies for assembling tight fitting objects. I will then question the suitability of the conventional RL formulation for learning to plan. Most planning problems are goal based in nature, and a planner should provide a solution for every possible goal. Standard RL, on the other hand, is based on a single goal formulation (the reward function), and making RL work in a multi-goal setting is challenging. In recent work we developed sub-goal trees (SGTs) — a new RL formulation that is developed from a different first principle – the all-pairs shortest path problem on graphs. We show that for multi-goal problems, SGTs are provably better at handling approximation errors than conventional RL (O(NlogN) vs. O(N^2)), and we propose several novel RL algorithms based on the SGT formalism. We demonstrate results on learning motion planning for a 7 DoF robot.

About the speaker

Aviv Tamar is currentlyan assistant professor at Technion – Israel Institute for Technology, where he holds the Robert J. Shillman Career Advancement Chair.

Register for this event

Please register for this event via Eventbrite.

About the seminar series

This is the first in this years series of seminars. The Jean Golding Institute has teamed up with the Heilbronn Institute for Mathematical Research to showcase the latest research in Data Science – methodology with roots in Mathematics and Computer Science with important applied implications.

Our seminar series features a range of internationally regarded high-profile speakers on topics that will be relevant to a broad audience.