OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 092

CONCLUSION 6 The reinforcement learning problem, of maximizing reward in a POMDP, is extremely general and lies at the core of artificial intelligence. Historically, most work in reinforce- ment learning has used function approximators with limited expressivity, but recent work in deep reinforcement learning (including this thesis) studies how to use expres- sive function approximators such as deep neural networks. These function approxima- tors are capable of performing multi-step computations, but they are also tractable to learn gradient-based optimization. Nevertheless, deep reinforcement learning brings many challenges, in how to develop reinforcement learning algorithms that are reliable, scalable, and reasonably sample efficient. This thesis is mostly concerned with developing deep reinforcement learning algo- rithms that are more reliable and sample-efficient than the algorithms that were available previously. In this work, we focus on using stochastic policies, for which it is possible to obtain estimators of the gradient of performance. We developed an algorithm called trust region policy optimization (TRPO), which is theoretically justified, and empirically performs well in the challenging domains of Atari and 2D simulated robotic locomo- tion. Recently, Duan et al. [Dua+16] found TRPO to perform the best overall out of the algorithms considered on a benchmark of continuous control problems. We also stud- ied variance reduction for policy gradient methods, unifying and expending on several some previous statements of this idea, and obtaining strong empirical results in the do- main of 3D simulated robotic locomotion, which exceed previous results obtained with reinforcement learning. The last work discussed, on stochastic computation graphs, makes the point that policy gradient methods for reinforcement learning are an instance of a more general class of techniques for optimizing objectives defined as expectations. We expect this to be useful for deriving optimization procedures in reinforcement learning or other probabilistic modeling problems; also, the unifying view motivates using RL algorithms like TRPO in 84

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)