OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 053

4 The two main challenges with policy gradient methods are the large number of samples typically required, and the difficulty of obtaining monotonic improvement despite the nonstationarity of the incoming data. The previous chapter addressed the monotonic- ity issue and provided some improvement in sample complexity due to the use of the natural gradient step, which was theoretically justified. This chapter provides further im- provements to sample complexity issue, by reducing the variance of the policy gradient estimates—the techniques of this chapter are equally applicable to other policy gradient methods such as the vanilla policy gradient algorithm. In this chapter, we propose a family of policy gradient estimators that significantly reduce variance of the policy gradient estimators while maintaining a tolerable level of bias. We call this estimation scheme, parameterized by γ ∈ [0, 1] and λ ∈ [0, 1], the gen- eralized advantage estimator (GAE). Related methods have been proposed in the con- text of online actor-critic methods [KK98; Waw09]. We provide a more general analysis, which is applicable in both the online and batch settings, and discuss an interpretation of our method as an instance of reward shaping [NHR99], where the approximate value function is used to shape the reward. We present experimental results on a number of highly challenging 3D locomotion tasks, where we show that our approach can learn complex gaits using high-dimensional, general purpose neural network function approximators for both the policy and the value function, each with over 104 parameters. The policies perform torque-level control of simulated 3D robots with up to 33 state dimensions and 10 actuators. The contributions of this chapter are summarized as follows: 1. We provide justification and intuition for an effective variance reduction scheme for policy gradients, which we call generalized advantage estimation (GAE). While 45 GENERALIZED ADVANTAGE ESTIMATION 4.1 overview

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--053

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP