PDF Publication Title:
Text from PDF Page: 031
3.4 optimization of parameterized policies 23 Algorithm 3 Approximate policy iteration algorithm guaranteeing non-decreasing ex- pected return η Initialize π0. for i = 0,1,2,... until convergence do Compute all advantage values Aπi (s, a). Solve the constrained optimization problem end for π = arg max[L (π) − ( 2ε′γ )Dmax(π , π)] πi (1−γ)2 KL i sa i+1 where ε′ = max max|Aπ(s, a)| π and Lπi(π)=η(πi)+ ρπi(s) π(a|s)Aπi(s,a) sa equality at πi. This algorithm is also reminiscent of proximal gradient methods and mirror descent. Trust region policy optimization, which we propose in the following section, is an ap- proximation to Algorithm 3, which uses a constraint on the KL divergence rather than a penalty to robustly allow large updates. 3.4 optimization of parameterized policies In the previous section, we considered the policy optimization problem independently of the parameterization of π and under the assumption that the policy can be evaluated at all states. We now describe how to derive a practical algorithm from these theoretical foundations, under finite sample counts and arbitrary parameterizations. Since we consider parameterized policies πθ(a | s) with parameter vector θ, we will overload our previous notation to use functions of θ rather than π, e.g. η(θ) := η(πθ), Lθ(θ ̃) := Lπθ(πθ ̃), and DKL(θ ∥ θ ̃) := DKL(πθ ∥ πθ ̃). We will use θold to denote the previous policy parameters that we want to improve upon. The preceding section showed that η(θ) L (θ) − CDmax(θ , θ), with equality θold KL old at θ = θold. Thus, by performing the following maximization, we are guaranteed to improve the true objective η: maximize[L (θ) − CDmax(θ , θ)]. θθold KLoldPDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |