OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 081

There are several alternative ways to define the surrogate objective function that give the same 5.4 variance reduction 73 Surrogate Loss Computation Graph gradient as L from Corollary 1. We could also (1) θ (2) θ (3) θ (4) θ (5) θ x0 x y log p(y|x)fˆ write L(Θ, S) := 􏰊 p(wˆ | depsw) Qˆ + 􏰊 c(deps ), w Pˆv w c∈C c where Pˆw is the probability p(w | depsw) obtained during sampling, which is viewed as a constant. The surrogate objective from Corollary 1 is actu- ally an upper bound on the true objective in the case that (1) all costs c ∈ C are negative, (2) the the costs are not deterministically influenced by the parameters Θ. This construction allows from majorization-minimization algorithms (similar to EM) to be applied to general stochastic computa- tion graphs. See Section 5.9 for details. 5.3.3 Higher-Order Derivatives. f The gradient estimator for a stochastic computation graph is itself a stochastic computa- tion graph. Hence, it is possible to compute the gradient yet again (for each component of the gradient vector), and get an estimator of the Hessian. For most problems of in- terest, it is not efficient to compute this dense Hessian. On the other hand, one can also differentiate the gradient-vector product to get a Hessian-vector product—this compu- tation is usually not much more expensive than the gradient computation itself. The Hessian-vector product can be used to implement a quasi-Newton algorithm via the conjugate gradient algorithm [WN99]. A variant of this technique, called Hessian-free optimization [Mar10], has been used to train large neural networks. 5.4 variance reduction Consider estimating ∂ Ex∼p(·; θ) [f(x)]. Clearly this expectation is unaffected by subtract- ing a constant b from the integrand, which gives ∂ Ex∼p(·; θ) [f(x) − b]. Taking the score ∂θ ∂θ function estimator, we get ∂ Ex∼p(·; θ) [f(x)] = Ex∼p(·; θ) 􏰄 ∂ log p(x; θ)(f(x) − b)􏰅. Taking ∂θ ∂θ b = Ex [f(x)] generally leads to substantial variance reduction—b is often called a base- log p(x; θ)fˆ log p(x; θ)fˆ log p(x; θ)fˆ log p(x1|x0; θ) ˆˆ (f1 +f2) ˆ log p(x2|x1; θ)f2 Figure 12: Deterministic computation graphs obtained as surrogate loss func- tions of stochastic computation graphs from Figure 11.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)