PDF Publication Title:
Text from PDF Page: 058
4.3 advantage function estimation 50 since the term γkV(st+k) becomes more heavily discounted, and the term −V(st) does ∞ not affect the bias. Taking k → ∞, we get k−1 ˆ(∞) A = γlδV =−V(s)+ γlr , t t t t+l l=0 l=0 which is simply the empirical returns minus the value function baseline. The generalized advantage estimator GAE(γ, λ) is defined as the exponentially-weighted average of these k-step estimators: AˆGAE(γ,λ) :=(1−λ)Aˆ(1)+λAˆ(2)+λ2Aˆ(3)+... =(1−λ)δVt +λ(δVt +γδVt+1)+λ2(δVt +γδVt+1+γ2δVt+2)+... = (1−λ)(δVt (1+λ+λ2 +...)+γδVt+1(λ+λ2 +λ3 +...) +γδVt+2(λ2 +λ3 +λ4 +...)+...) V1 Vλ 2Vλ2 = (1−λ) δt 1−λ +γδt+1 1−λ +γ δt+2 1−λ +... ∞ (γλ)lδVt+l (26) From Equation (26), we see that the advantage estimator has a remarkably simple for- mula involving a discounted sum of Bellman residual terms. Section 4.4 discusses an interpretation of this formula as the returns in an MDP with a modified reward func- tion. The construction we used above is closely analogous to the one used to define TD(λ) [SB98], however TD(λ) is an estimator of the value function, whereas here we are estimating the advantage function. There are two notable special cases of this formula, obtained by setting λ = 0 and λ = 1. GAE(γ,0): Aˆt :=δt =rt+γV(st+1)−V(st) (27) ˆ∞ ∞ GAE(γ, 1) : At := γlδt+l = γlrt+l − V(st) (28) l=0 l=0 GAE(γ,1) is γ-just regardless of the accuracy of V, but it has high variance due to the sum of terms. GAE(γ, 0) is γ-just for V = V π,γ and otherwise induces bias, but it typically has much lower variance. The generalized advantage estimator for 0 < λ < 1 makes a compromise between bias and variance, controlled by parameter λ. t ttt = l=0PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |