PDF Publication Title:
Text from PDF Page: 047
3.12 efficiently solving the trust-region constrained optimization problem 39 NextletusboundtheO(∆2)termγ2rG∆G∆G ̃ρ.FirstweconsidertheproductγrG∆= γv∆. Consider the component s of this dual vector. (π ̃(s,a)−π(s,a))Qπ(s,a) So we have that (γv∆)s = = a (π ̃(s, a) − π(s, a))Qπ(s, a) ∥A∥1 =sup{∥Aρ∥1} ρ ∥ρ∥1 where we have that ∥G∥1 = ∥G ̃ ∥1 = 1/(1 − γ) and ∥∆∥1 = 2α. That gives ∥G∆G ̃ρ∥1∥G∥1∥∆∥1∥G ̃∥1∥ρ∥1 |π ̃(a|s)−π(a|s)| WeboundtheotherportionG∆G ̃ρusingthel1operatornorm a αε a a|π ̃(a|s)−π(a|s)| =1·α·1·1 1−γ 1−γ γ2|rG∆G∆G ̃ρ|γ∥γrG∆∥∞∥G∆G ̃ρ∥1 γ · αε · 2α (1−γ)2 =α2 2γε (1−γ)2 3.12 efficiently solving the trust-region constrained optimization prob- lem This section describes how to efficiently approximately solve the following constrained optimization problem, which we must solve at each iteration of TRPO: maximizeL(θ) subjecttoDKL(θold,θ)δ.PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)