logo

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 085

5.8 proofs 77 putation graph, we can automatically obtain a gradient estimator, given that the graph satisfies the appropriate conditions on differentiability of the functions at its nodes. The gradient can be computed efficiently in a backwards traversal through the graph: one approach is to apply the standard backpropagation algorithm to one of the surrogate loss functions from Section 5.3; another approach (which is roughly equivalent) is to ap- ply a modified backpropagation procedure shown in Algorithm 4. The results we have presented are sufficiently general to automatically reproduce a variety of gradient esti- mators that have been derived in prior work in reinforcement learning and probabilistic modeling, as we show in Section 5.10. We hope that this work will facilitate further development of interesting and expressive models. 5.8 proofs Theorem 1 We will consider the case that all of the random variables are continuous-valued, thus the expectations can be written as integrals. For discrete random variables, the integrals should be changed to sums. Recall that we seek to compute ∂ E 􏰄􏰊 c􏰅. We will differentiate the expectation of ∂θ c∈C a single cost term; summing over these terms yields Equation (39). 􏳀􏲿 Ev∈S,[c]= v≺c p(v|depsv)dv c(depsc) ∂ ∂􏳀􏲿 ∂θEv∈S,[c]=∂θ v≺c v∈S, v≺c = v∈S, v≺c p(v | depsv)dv ∂θ log p(w | depsw) c(depsc) + ∂θc(depsc) w∈S, w≺c  v∈S, v≺c p(v|depsv)dv c(depsc) 􏳀􏲿􏰋  ∂p(w|deps ) ∂ w = p(v | depsv)dv ∂θ c(depsc) + c(depsc) (40) v∈S, w∈S, p(w | depsw) v≺c w≺c ∂θ  􏳀􏲿 􏰋􏰹∂ 􏰺 ∂ 􏰋∂∂ = Ev∈S,  ∂θ log p(w | depsw)cˆ + ∂θc(depsc) . v≺c w∈S, w≺c

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--085

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP