logo

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 086

5.9 surrogate as an upper bound, and mm algorithms 78 Equation (40) requires that the integrand is differentiable, which is satisfied if all of the PDFs and c(depsc) are differentiable. Equation (39) follows by summing over all costs c ∈ C. Equation (38) follows from rearrangement of the terms in this equation. Theorem 2 It suffices to show that for a particular node v ∈ S, the following expectation (taken over all variables) vanishes 􏰻􏰹∂ 􏰺 􏰼 E ∂θ log p(v | parentsv) b(NonInfluenced(v)) . Analogously to NonInfluenced(v), define Influenced(v) := {w | w ≻ v}. Note that the nodes can be ordered so that NonInfluenced(v) all come before v in the ordering. Thus, we can write 􏰻 􏰻􏰹∂ 􏰺 􏰼􏰼 ENonInfluenced(v) EInfluenced(v) ∂θ log p(v | parentsv) b(NonInfluenced(v)) 􏰻 􏰻􏰹∂ 􏰺􏰼 􏰼 = ENonInfluenced(v) EInfluenced(v) ∂θ log p(v | parentsv) b(NonInfluenced(v)) = ENonInfluenced(v) [0 · b(NonInfluenced(v))] =0 whereweusedEInfluenced(v)􏰄􏰂∂ logp(v|parentsv)􏰃􏰅=Ev􏰄􏰂∂ logp(v|parentsv)􏰃􏰅= ∂θ ∂θ 0. 5.9 surrogate as an upper bound, and mm algorithms L has additional significance besides allowing us to estimate the gradient of the expected sum of costs. Under certain conditions, L is a upper bound on on the true objective (plus a constant). We shall make two restrictions on the stochastic computation graph: (1) first, that all costs c ∈ C are negative. (2) the the costs are not deterministically influenced by the parameters Θ. First, let us use importance sampling to write down the expectation of a given cost node, when the sampling distribution is different from the distribution we are evaluating: for parameter θ ∈ Θ, θ = θold is used for sampling, but we are evaluating at

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--086

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP