OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 091

5.10 examples 83 POMDPs. POMDPs differ from MDPs in that the state st of the environment is not observed directly but, as in latent-variable time series models, only through stochastic observations ot, which depend on the latent states st via pE(ot | st). The policy there- fore has to be a function of the history of past observations πθ(at | o1 . . . ot). Applying Theorem 2, we obtain a gradient estimator: ∂􏰌􏰋T∂􏱀􏰋T 􏱁􏰍 ∂θL = Eτ∼pθ ∂θ logπθ(at |o1 ...ot)) r(st′,at′)−bt(o1 ...ot) . (45) t=1 t′=t Here, the baseline bt and the policy πθ can depend on the observation history through time t, and these functions can be parameterized as recurrent neural networks [Wie+10; Mni+14]. The stochastic computation graph is shown in Figure 14. θ s1 s2 ... sT a1 a2 ... aT r1 r2 ... rT s1 s2 o1 o2 m1 m2 a1 a2 r1 r2 θ ... sT ... oT ... mT ... aT ... rT Figure 14: Stochastic Computation Graphs for MDPs (left) and POMDPs (right)

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)