OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 043

3.10 proof of policy improvement bound 35 Proof. First note that Aπ(s, a) = Es′∼P(s′ | s,a) [r(s) + γVπ(s′) − Vπ(s)]. Therefore, = − η ( π ) + η ( π ̃ ) Rearranging, the result follows. Eτ | π ̃ = Eτ | π ̃ 􏳈􏰋∞ 􏳉 γtAπ(st, at) t=0 􏳈􏰋∞ 􏳉 γt(r(st) + γVπ(st+1) − Vπ(st)) 􏳈 􏰋∞ 􏳉 γtr(st) 􏳈􏰋∞ 􏳉 γtr(st) t=0 −Vπ(s0) + = Eτ | π ̃ = −Es0 [Vπ(s0)] + Eτ | π ̃ Define A ̄ π,π ̃ (s) to be the expected advantage of π ̃ over π at state s: A ̄ π , π ̃ ( s ) = E a ∼ π ̃ ( · | s ) [ A π ( s , a ) ] . Now Lemma 1 can be written as follows: η(π ̃) = η(π)+Eτ∼π ̃ Note that Lπ can be written as 􏳈􏰋∞ ̄ 􏳉 γtAπ,π ̃(st) t=0 􏳈􏰋∞ ̄ 􏳉 γtAπ,π ̃(st) t=0 Lπ(π ̃) = η(π)+Eτ∼π t=0 t=0 (19) The difference in these equations is whether the states are sampled using π or π ̃. To bound the difference between η(π ̃) and Lπ(π ̃), we will bound the difference arising from each timestep. To do this, we first need to introduce a measure of how much π and π ̃ agree. Specifically, we’ll couple the policies, so that they define a joint distribution over pairs of actions.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)