OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 047

3.12 efficiently solving the trust-region constrained optimization problem 39 NextletusboundtheO(∆2)termγ2rG∆G∆G ̃ρ.FirstweconsidertheproductγrG∆= γv∆. Consider the component s of this dual vector. 􏰋 (π ̃(s,a)−π(s,a))Qπ(s,a) So we have that (γv∆)s = = a􏰊 􏰋 (π ̃(s, a) − π(s, a))Qπ(s, a) ∥A∥1 =sup{∥Aρ∥1} ρ ∥ρ∥1 where we have that ∥G∥1 = ∥G ̃ ∥1 = 1/(1 − γ) and ∥∆∥1 = 2α. That gives ∥G∆G ̃ρ∥1􏳆∥G∥1∥∆∥1∥G ̃∥1∥ρ∥1 |π ̃(a|s)−π(a|s)| WeboundtheotherportionG∆G ̃ρusingthel1operatornorm a 􏳆 αε a 􏰊 a|π ̃(a|s)−π(a|s)| =1·α·1·1 1−γ 1−γ γ2|rG∆G∆G ̃ρ|􏳆γ∥γrG∆∥∞∥G∆G ̃ρ∥1 􏳆 γ · αε · 2α (1−γ)2 =α2 2γε (1−γ)2 3.12 efficiently solving the trust-region constrained optimization prob- lem This section describes how to efficiently approximately solve the following constrained optimization problem, which we must solve at each iteration of TRPO: maximizeL(θ) subjecttoDKL(θold,θ)􏳆δ.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)