OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 094

6.1 frontiers 86 exploration work in the deep RL setting includes methods based on Thompson sampling [Osb+16] and exploration bonuses [Hou+16]. 4. Using learned models: model-based reinforcement learning methods seek to speed up learning by fitting a dynamics model and using it for planning or speeding up learning. It is known that in certain low-dimensional continuous control problems, it is possible to learn good controllers in an extremely small number of samples (e.g., [DR11; Mol+15]); however, this success has not yet been extended to problems with high-dimensional state spaces. More generally, many have found that model based methods learn faster (in fewer samples) than model-free methods such as policy gradients and Q-learning when they work; however, no method has yet emerged that can perform as well as model-free methods on challenging high- dimensional tasks, such as the Atari and MuJoCo tasks considered in this thesis. Guided policy search, which uses a model for trajectory optimization [Lev+16], was used to learn some behaviors efficiently on a physical robot. These methods also have yet to be extended to problems that require controlling a high-dimensional state. 5. Finer-grained credit assignment: the policy gradient estimator performs credit assign- ment in a crude way, since it credits an action with all rewards that follow the action. However, often it is possible to do better credit assignment based on some knowl- edge of the system. For example, when one serves a tennis ball, the result does not depend on any action he takes after his racket hits the ball; however, that sort of inference is not included in any of our reinforcement learning algorithms. It should be possible to do better credit assignment with the help of a model of the system. Heess et al. [Hee+15] tried model-based credit assignment and obtained a negative result; however, other possible instantiations of the idea might be more successful. Another technique for variance reduction was proposed in [LCR02]; however, this technique only provides a moderate amount of variance reduction. If there were a generic method for approximating the unknown or non-differentiable components in a stochastic computation graph (e.g., the dynamics model in reinforcement learn- ing) and using them to obtain better gradient estimates, this method could provide significant benefits in reinforcement learning and probabilistic modeling problems that involve “hard” decisions.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)