OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 010

1.3 deep reinforcement learning 2 data, to minimize prediction-error-plus-regularization on training data. The reduction from learning to optimization is less straightforward in reinforcement learning (RL) than it is in supervised learning. One difficulty is that we don’t have full analytic access to the function we’re trying to optimize, the agent’s expected total reward—this objective also depends on the unknown dynamics model and reward func- tion. Another difficulty is that the agent’s input data strongly depends on its behavior, which makes it hard to develop algorithms with monotonic improvement. Complicating the problem, there are several different functions that one might approximate, as we will discuss in Section 1.4 1.3 deep reinforcement learning Deep reinforcement learning is the study of reinforcement using neural networks as function approximators. The idea of combining reinforcement learning and neural net- works is not new—Tesauro’s TD-Gammon [Tes95], developed in the early 1990s, used a neural network value function and played at the level of top human players, and neu- ral networks have been used for long time in system identification and control [NP90]. Lin’s 1993 thesis [Lin93] explored the combination of various reinforcement learning algorithms with neural networks, with application to robotics. However, in the two decades following Tesauro’s results, RL with nonlinear function approximation remained fairly obscure. At the time when this thesis work was beginning (2013), none of the existing RL textbooks (such as [SB98; Sze10]) devoted much attention to nonlinear function approximation. Most RL papers, in leading machine learning con- ferences such as NIPS and ICML were mostly focused on theoretical results and on toy problems where linear-in-features or tabular function approximators could be used. In the early 2010s, the field of deep learning begin to have groundbreaking empirical success, in speech recognition [Dah+12] and computer vision [KSH12]. The work de- scribed in this thesis began after the realization that similar breakthroughs were possible (and inevitable) in reinforcement learning, and would eventually dominate the special- purposes methods which were being used in domains like robotics. Whereas much work in reinforcement learning only applies in the case of linear or tabular functions, such methods will not be applicable in settings where we need to learn functions that per- form multi-step computation. On the other hand, deep neural networks can successfully approximate these functions, and their empirical success in supervised learning shows that it is tractable to optimize them. An explosion of interest in deep reinforcement learning occurred following the re-

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)