OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 015

1.6 contributions of this thesis 7 policy update but doesn’t necessarily use the natural gradient step direction. This work was previously published as [Sch+15c]. Policy gradient methods, including TRPO, often require a large number of samples to learn. They work by trying to determine which actions were good, and then increasing the probability of the good actions. Determining which actions were good is called the credit assignment problem (e.g., see [SB98])—when the agent receives a reward, we need to determine which preceding actions deserve credit for it and should be reinforced. The next line of work described in this thesis analyzes this credit assignment problem, and how we can reduce the variance of policy gradient estimation through the use of value functions. By combining the proposed technique, which we call generalized advantage esti- mation, with TRPO, we are able to obtain state-of-the-art results on simulated 3D robotic tasks. 3D locomotion has been considered to be a challenging problem for all methods for a long time; yet our method is able to automatically obtain stable walking controllers for a 3D humanoid and quadruped, as well as a policy that enables a 3D humanoid to stand up off the ground—all using the same algorithm and hyperparameters. This work was previously published as [Sch+15b] When optimizing stochastic policies, the reinforcement learning problem turns into a problem of optimizing an expectation, defined on a stochastic process with many sampled random variables. Problems with similar structure occur in problems outside of reinforcement learning; for example, in variational inference, and in models that use “hard decisions” for memory and attention. The last contribution of this thesis is the formalism of stochastic computation graphs, which are aimed to unify reinforce- ment learning and these other problems that involve optimizing expectations. Stochastic computation graphs allow one to automatically derive gradient estimators and variance- reduction schemes for a variety of different objectives that have been used in reinforce- ment learning and probabilistic modeling, reproducing the special-purpose estimators that were previously derived for these objectives. The formalism of stochastic computa- tion graphs could assist researchers in developing intricate models involving a combina- tion of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions—and also in creating software that automatically computes these gra- dients given a model definition, as with automatic differentiation software. This work was previously published as [Sch+15a].

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)