OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 087

θ = θnew. Ev≺c|θ 5.10 examples 79   􏳆Ev≺c|θ cˆlog􏲿 Pv(v|depsv\θ,θnew)+1    P (v | deps \θ, θ )   old v≺c, v v old  θ≺D v new  [cˆ]=Ev≺c|θ cˆ 􏲿 Pv(v|depsv\θ,θnew) old  v≺c, Pv(v | depsv\θ, θold)  θ≺D v   where the second line used the inequality x 􏳇 log x + 1, and the sign is reversed since cˆ is negative. Summing over c ∈ C and rearranging we get 􏳈􏰋 􏰋 􏰹p(v|depsv\θ,θnew)􏰺ˆ 􏳉 cˆ+ log p(v|depsv\θ,θold) Qv 􏳈􏰋 􏳉 cˆ 􏳆ES|θold c∈C = ES | θold Equation (41) allows for majorization-minimization algorithms (like the EM algorithm) to be used to optimize with respect to θ. In fact, similar equations have been derived by interpreting rewards (negative costs) as probabilities, and then taking the variational lower bound on log-probability (e.g., [Vla+09]). 5.10 examples This section considers two settings where the formalism of stochastic computation graphs can be applied. First, we consider the generalized EM algorithm for maximum likelihood estimation in probabilistic models with latent variables. Second, we consider reinforce- ment learning in Markov Decision Processes. In both cases, the objective function is given by an expectation; writing it out as a composition of stochastic and deterministic steps yields a stochastic computation graph. 5.10.1 Generalized EM Algorithm and Variational Inference. The generalized EM algorithm maximizes likelihood in a probabilistic model with latent variables [NH98]. We start with a parameterized probability density p(x, z; θ) where x is ES|θnew c∈C v∈S 􏳈􏰋 ˆ􏳉 v∈S log p(v | depsv\θ, θnew)Qv + const. (41)

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)