OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 045

Proof. 3.11 perturbation theory proof of policy improvement bound 37 􏳈􏰋∞ ̄ 􏳉 􏳈􏰋∞ ̄ 􏳉 η(π ̃)−Lπ(π ̃) = Eτ∼π ̃ γtAπ,π ̃(st) −Eτ∼π γtAπ,π ̃(st) t=0 t=0 􏰋∞ ̄ ̄ = γt(Est∼π ̃ 􏰄Aπ,π ̃(st)􏰅−Est∼π 􏰄Aπ,π ̃(st)􏰅) t=0 ∞ 􏰋 t􏰆􏰆 ̄π,π ̃ |η(π ̃) − Lπ(π ̃)| 􏳆 γ Est∼π ̃ 􏰄A t=0 􏰋∞ ̄π,π ̃ (st)􏰅 − Est∼π 􏰄A (st)􏰅 􏰆􏰆 􏳆 γt ·2ε·(1−(1−αt)) t=0 = 2εγα (1−γ)(1−γ(1−α)) Last, we need to use the correspondence between total variation divergence and cou- pled random variables: Suppose pX and pY are distributions with DT V (pX ∥ pY ) = α. Then there exists a joint distribution (X,Y) whose marginals are pX,pY, for which X = Y with probability 1 − α. See [LPW09], Proposition 4.7. It follows that if we have two policies π and π ̃ such that maxs DT V (π(· | s) ∥ π ̃ (· | s))α, then we can define an α-coupled policy pair (π,π ̃) with appropriate marginals. Proposi- tion 1 follows. 3.11 perturbation theory proof of policy improvement bound We also provide a different proof of Proposition 1 using perturbation theory. This method makes it possible to provide slightly stronger bounds. Proposition 1a. Let α denote the maximum total variation divergence between stochastic policies π and π ̃, as defined in Equation (10), and let L be defined as in Equation (5). Then 2γε (1−γ)2 η ( π ̃ ) 􏳇 L ( π ̃ ) − α 2

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--045

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP