PDF Publication Title:
Text from PDF Page: 089
5.10 examples 81 Variational Autoencoder, Deep Latent Gaussian Models and Reparameterization. Here we’ll note out that in some cases, the stochastic computation graph can be trans- formed to give the same probability distribution for the observed variables, but one obtains a different gradient estimator. Kingma and Welling [KW13] and Rezende et al. [RMW14] consider a model that is similar to the one proposed by Mnih et al. [MG14] but with continuous latent variables, and they re-parameterize their inference network to enable the use of the PD estimator. The original objective, the variational lower bound, is Lorig(θ, φ) = Eh∼qφ pθ(x|h)pθ(h) log qφ(h|x) . The second term, the entropy of qφ, can be computed analytically for the parametric forms of q considered in the paper (Gaussians). For qφ being conditionally Gaussian, i.e. qφ(h|x) = N(h|μφ(x), σφ(x)) re-parameterizing leads to h = hφ(ε; x) = μφ(x) + εσφ(x), giving Lre(θ, φ) = Eε∼ρ log pθ(x|hφ(ε, x)) + log pθ(hφ(ε, x)) + H[qφ(·|x)]. The stochastic computation graph before and after reparameterization is shown in Fig- ure 13. Given ε ∼ ρ an estimate of the gradient is obtained as ∂Lre ≈ ∂ logpθ(x|hφ(ε,x))+logpθ(hφ(ε,x)), ∂θ ∂θ ∂Lre ∂ ∂ ∂h ∂ ∂φ ≈ ∂h log pθ(x|hφ(ε, x)) + ∂h log pθ(hφ(ε, x)) ∂φ + ∂φH[qφ(·|x)]. 5.10.2 Policy Gradients in Reinforcement Learning. In reinforcement learning, an agent interacts with an environment according to its policy π, and the goal is to maximize the expected sum of rewards, called the return. Policy gradient methods seek to directly estimate the gradient of expected return with respect to the policy parameters [Wil92; BB01; Sut+99]. In reinforcement learning, we typically assume that the environment dynamics are not available analytically and can only be sampled. Below we distinguish two important cases: the Markov decision process (MDP) and the partially observable Markov decision process (POMDP).PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)