PDF Publication Title:
Text from PDF Page: 044
3.10 proof of policy improvement bound 36 In words, this means that at each state, (π, π ̃ ) gives us a pair of actions, and these actions differ with probability α. Lemma 2. Let (π,π ̃) be an α-coupled policy pair. Then Est∼π ̃ A ̄π,π ̃(st)−Est∼πA ̄π,π ̃(st)2ε(1−(1−α)t), where ε = max|A ̄ π,π ̃ (s)| s Proof. Considergeneratingatrajectoryusingπ ̃,i.e.,ateachtimestepiwesample(ai,a ̃i)|st, and we choose the action a ̃i and ignore ai. Let nt denote the number of times that ai ̸= a ̃i for i < t, i.e., the number of times that π and π ̃ disagree before arriving at state st. Est∼π ̃ A ̄ π,π ̃ (st) = P(nt = 0)Est∼π ̃ | nt=0 A ̄ π,π ̃ (st) + P(nt > 0)Est∼π ̃ | nt>0 A ̄ π,π ̃ (st) P(nt = 0) = (1 − α)t, and Est∼π ̃ | nt=0 A ̄ π,π ̃ (st) = Est∼π | nt=0 A ̄ π,π ̃ (st), because nt = 0 indicates that π and π ̃ agreed on all timesteps less than t. Therefore, we have Est∼π ̃ A ̄π,π ̃(st)=(1−αt)Est∼π|nt=0A ̄π,π ̃(st)+(1−(1−αt))Est∼π ̃|nt>0A ̄π,π ̃(st) Subtracting Est∼π | nt=0 A ̄ π,π ̃ (st) from both sides, Est∼π ̃ A ̄π,π ̃(st)−Est∼πA ̄π,π ̃(st)=(1−(1−αt))(−Est∼π|nt=0A ̄π,π ̃(st)+Est∼π ̃|nt>0A ̄π,π ̃(st)) Est∼π ̃ A ̄π,π ̃(st)−Est∼πA ̄π,π ̃(st)(1−(1−αt))(ε+ε) Now we can sum over time to bound the error of Lπ. Lemma 3. Suppose (π,π ̃) is an α-coupled policy pair. Then |η(π ̃) − Lπ(π ̃)| 2εγα (1−γ)(1−γ(1−α)) Definition 1. (π,π ̃) is an α-coupled policy pair if it defines a joint distribution (a,a ̃)|s, such that P(a ̸= a ̃ | s) α for all s. π and π ̃ will denote the marginal distributions of a and a ̃, respectively.PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)