PDF Publication Title:
Text from PDF Page: 073
5.2 preliminaries 65 tation graphs that we introduce. For these examples, the variance-reduced gradient esti- mators derived in prior work are special cases of the results in Sections 5.3 and 5.4. The contributions of this chapter are as follows: • We introduce a formalism of stochastic computation graphs, and in this general setting, we derive unbiased estimators for the gradient of the expected loss. • We show how this estimator can be computed as the gradient of a certain dif- ferentiable function (which we call the surrogate loss), hence, it can be computed efficiently using the backpropagation algorithm. This observation enables a prac- titioner to write an efficient implementation using automatic differentiation soft- ware. • We describe variance reduction techniques that can be applied to the setting of stochastic computation graphs, generalizing prior work from reinforcement learn- ing and variational inference. • We briefly describe how to generalize some other optimization techniques to this setting: majorization-minimization algorithms, by constructing an expression that bounds the loss function; and quasi-Newton / Hessian-free methods [Mar10], by computing estimates of Hessian-vector products. The main practical result of this chapter is that to compute the gradient estimator, one just needs to make a simple modification to the backpropagation algorithm, where extra gradient signals are introduced at the stochastic nodes. Equivalently, the resulting algorithm is just the backpropagation algorithm, applied to the surrogate loss function, which has extra terms introduced at the stochastic nodes. The modified backpropagation algorithm is presented in Section 5.5. 5.2 preliminaries 5.2.1 Gradient Estimators for a Single Random Variable This section will discuss computing the gradient of an expectation taken over a single random variable—the estimators described here will be the building blocks for more complex cases with multiple variables. Suppose that x is a random variable, f is a func- tion (say, the cost), and we are interested in computing ∂ Ex [f(x)]. There are a few ∂θ different ways that the process for generating x could be parameterized in terms of θ, which lead to different gradient estimators. • We might be given a parameterized probability distribution x ∼ p(·; θ). In this case,PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)