OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 044

3.10 proof of policy improvement bound 36 In words, this means that at each state, (π, π ̃ ) gives us a pair of actions, and these actions differ with probability 􏳆 α. Lemma 2. Let (π,π ̃) be an α-coupled policy pair. Then 􏰆􏰆Est∼π ̃ 􏰄A ̄π,π ̃(st)􏰅−Est∼π􏰄A ̄π,π ̃(st)􏰅􏰆􏰆􏳆2ε(1−(1−α)t), where ε = max|A ̄ π,π ̃ (s)| s Proof. Considergeneratingatrajectoryusingπ ̃,i.e.,ateachtimestepiwesample(ai,a ̃i)|st, and we choose the action a ̃i and ignore ai. Let nt denote the number of times that ai ̸= a ̃i for i < t, i.e., the number of times that π and π ̃ disagree before arriving at state st. Est∼π ̃ 􏰄A ̄ π,π ̃ (st)􏰅 = P(nt = 0)Est∼π ̃ | nt=0 􏰄A ̄ π,π ̃ (st)􏰅 + P(nt > 0)Est∼π ̃ | nt>0 􏰄A ̄ π,π ̃ (st)􏰅 P(nt = 0) = (1 − α)t, and Est∼π ̃ | nt=0 􏰄A ̄ π,π ̃ (st)􏰅 = Est∼π | nt=0 􏰄A ̄ π,π ̃ (st)􏰅, because nt = 0 indicates that π and π ̃ agreed on all timesteps less than t. Therefore, we have Est∼π ̃ 􏰄A ̄π,π ̃(st)􏰅=(1−αt)Est∼π|nt=0􏰄A ̄π,π ̃(st)􏰅+(1−(1−αt))Est∼π ̃|nt>0􏰄A ̄π,π ̃(st)􏰅 Subtracting Est∼π | nt=0 􏰄A ̄ π,π ̃ (st)􏰅 from both sides, Est∼π ̃ 􏰄A ̄π,π ̃(st)􏰅−Est∼π􏰄A ̄π,π ̃(st)􏰅=(1−(1−αt))(−Est∼π|nt=0􏰄A ̄π,π ̃(st)􏰅+Est∼π ̃|nt>0􏰄A ̄π,π ̃(st)􏰅) 􏰆􏰆Est∼π ̃ 􏰄A ̄π,π ̃(st)􏰅−Est∼π􏰄A ̄π,π ̃(st)􏰅􏰆􏰆􏳆(1−(1−αt))(ε+ε) Now we can sum over time to bound the error of Lπ. Lemma 3. Suppose (π,π ̃) is an α-coupled policy pair. Then |η(π ̃) − Lπ(π ̃)| 􏳆 2εγα (1−γ)(1−γ(1−α)) Definition 1. (π,π ̃) is an α-coupled policy pair if it defines a joint distribution (a,a ̃)|s, such that P(a ̸= a ̃ | s) 􏳆 α for all s. π and π ̃ will denote the marginal distributions of a and a ̃, respectively.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--044

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP