PDF Publication Title:
Text from PDF Page: 094
6.1 frontiers 86 exploration work in the deep RL setting includes methods based on Thompson sampling [Osb+16] and exploration bonuses [Hou+16]. 4. Using learned models: model-based reinforcement learning methods seek to speed up learning by fitting a dynamics model and using it for planning or speeding up learning. It is known that in certain low-dimensional continuous control problems, it is possible to learn good controllers in an extremely small number of samples (e.g., [DR11; Mol+15]); however, this success has not yet been extended to problems with high-dimensional state spaces. More generally, many have found that model based methods learn faster (in fewer samples) than model-free methods such as policy gradients and Q-learning when they work; however, no method has yet emerged that can perform as well as model-free methods on challenging high- dimensional tasks, such as the Atari and MuJoCo tasks considered in this thesis. Guided policy search, which uses a model for trajectory optimization [Lev+16], was used to learn some behaviors efficiently on a physical robot. These methods also have yet to be extended to problems that require controlling a high-dimensional state. 5. Finer-grained credit assignment: the policy gradient estimator performs credit assign- ment in a crude way, since it credits an action with all rewards that follow the action. However, often it is possible to do better credit assignment based on some knowl- edge of the system. For example, when one serves a tennis ball, the result does not depend on any action he takes after his racket hits the ball; however, that sort of inference is not included in any of our reinforcement learning algorithms. It should be possible to do better credit assignment with the help of a model of the system. Heess et al. [Hee+15] tried model-based credit assignment and obtained a negative result; however, other possible instantiations of the idea might be more successful. Another technique for variance reduction was proposed in [LCR02]; however, this technique only provides a moderate amount of variance reduction. If there were a generic method for approximating the unknown or non-differentiable components in a stochastic computation graph (e.g., the dynamics model in reinforcement learn- ing) and using them to obtain better gradient estimates, this method could provide significant benefits in reinforcement learning and probabilistic modeling problems that involve “hard” decisions.PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)