OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 090

x h1 h2 h3 r1 r2 r3 φ1 φ2 φ3 θ1 θ2 θ3 5.10 examples 82 φθ x h1 z h2 x ̃ ⇓ Reparameterization φεθ x h1 z h2 x ̃ L L Figure 13: Stochastic computation graphs for NVIL (left) and VAE (right) models MDPs. In the MDP case, the expectation is taken with respect to the distribution over state (s) and action (a) sequences L(θ) = Eτ∼pθ where τ = (s1, a1, s2, a2, . . . ) are trajectories and the distribution over trajectories is de- 􏳈􏰋T 􏳉 r(st, at) , t=1 fined in terms of the environment dynamics pE(st+1 | st, at) and the policy πθ: pθ(τ) = pE(s1) 􏲾 πθ(at | st)pE(st+1 | st, at). r are rewards (negative costs in the terminology of t the rest of the paper). The classic REINFORCE [Wil92] estimate of the gradient is given by ∂􏳈􏰋T∂􏱀􏰋T 􏱁􏳉 ∂θL = Eτ∼pθ ∂θ logπθ(at |st) r(st′,at′)−bt(st) , (44) t=1 t′=t where bt(st) is an arbitrary baseline which is often chosen to approximate Vt(st) = E 􏰌􏰊T′ r(s ′ , a ′ )􏰍, i.e. the state-value function. Note that the stochastic action nodes τ∼pθ t=t t t at “block” the differentiable path from θ to rewards, which eliminates the need to differ- entiate through the unknown environment dynamics. The stochastic computation graph is shown in Figure 14.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--090

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP