OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 088

5.10 examples 80 observed, z is a latent variable, and θ is a parameter of the distribution. The generalized EM algorithm maximizes the variational lower bound, which is defined by an expectation over z for each sample x: L(θ, q) = Ez∼q 􏰻 􏰹p(x, z; θ)􏰺􏰼 log q(z) . As parameters will appear both in the probability density and inside the expectation, stochastic computation graphs provide a convenient route for deriving the gradient esti- mators. Neural variational inference. Mnih and Gregor [MG14] propose a generalized EM algorithm for multi-layered latent variable models that employs an inference network, an explicit parameterization of the posterior qφ(z | x) ≈ p(z | x), to allow for fast approximate inference. The generative model and inference network take the form 􏰋 h1 ,h2 pθ1 (x|h1)pθ2 (h1|h2)pθ3 (h2|h3)pθ3 (h3) pθ(x) = qφ(h1, h2|x) = qφ1 (h1|x)qφ2 (h2|h1)qφ3 (h3|h2). qφ1 (· | x), h2 ∼  log pθ1(x|h1) +log pθ2(h1|h2) +log pθ3(h2|h3)pθ3(h3).  qφ1 (h1|x) qφ2 (h2|h1) qφ3 (h3|h2)  􏳃 􏳂􏳁 􏳄 􏳃 􏳂􏳁 􏳄 􏳃 􏳂􏳁 􏳄 The inference model qφ is used for sampling, i.e., we sample h1 ∼ qφ2 (· | h1), h3 ∼ qφ3 (· | h2). The stochastic computation graph is shown above. L(θ,φ) = E h∼qφ =r1 =r2 Given a sample h ∼ qφ an unbiased estimate of the gradient is given by Theorem 2 as ∂L ≈ ∂ logpθ1(x|h1)+ ∂ logpθ2(h1|h2)+ ∂ logpθ3(h2) (42) ∂θ ∂θ ∂θ ∂θ ∂L≈ ∂ logqφ1(h1|x)(Qˆ1−b1(x)) ∂φ ∂φ + ∂ logqφ2(h2|h1)(Qˆ2−b2(h1))+ ∂ logqφ3(h3|h2)(Qˆ3−b3(h2)) (43) ∂φ ∂φ whereQˆ1 =r1+r2+r3;Qˆ2 =r2+r3;andQˆ3 =r3,andb1,b2,b3 arebaselinefunctions. The stochastic computation graph is shown in Figure 13. =r3

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)