
PDF Publication Title:
Text from PDF Page: 092
CONCLUSION 6 The reinforcement learning problem, of maximizing reward in a POMDP, is extremely general and lies at the core of artificial intelligence. Historically, most work in reinforce- ment learning has used function approximators with limited expressivity, but recent work in deep reinforcement learning (including this thesis) studies how to use expres- sive function approximators such as deep neural networks. These function approxima- tors are capable of performing multi-step computations, but they are also tractable to learn gradient-based optimization. Nevertheless, deep reinforcement learning brings many challenges, in how to develop reinforcement learning algorithms that are reliable, scalable, and reasonably sample efficient. This thesis is mostly concerned with developing deep reinforcement learning algo- rithms that are more reliable and sample-efficient than the algorithms that were available previously. In this work, we focus on using stochastic policies, for which it is possible to obtain estimators of the gradient of performance. We developed an algorithm called trust region policy optimization (TRPO), which is theoretically justified, and empirically performs well in the challenging domains of Atari and 2D simulated robotic locomo- tion. Recently, Duan et al. [Dua+16] found TRPO to perform the best overall out of the algorithms considered on a benchmark of continuous control problems. We also stud- ied variance reduction for policy gradient methods, unifying and expending on several some previous statements of this idea, and obtaining strong empirical results in the do- main of 3D simulated robotic locomotion, which exceed previous results obtained with reinforcement learning. The last work discussed, on stochastic computation graphs, makes the point that policy gradient methods for reinforcement learning are an instance of a more general class of techniques for optimizing objectives defined as expectations. We expect this to be useful for deriving optimization procedures in reinforcement learning or other probabilistic modeling problems; also, the unifying view motivates using RL algorithms like TRPO in 84PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
| CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |