PDF Publication Title:
Text from PDF Page: 057
4.3 advantage function estimation This section will be concerned with producing an accurate estimate Aˆ t of the discounted advantage function Aπ,γ(st,at), which will then be used to construct a policy gradient estimator of the following form: 1 N ∞ ˆ Ant∇θlogπθ(ant |snt) (25) where n indexes over a batch of episodes. Let V be an approximate value function. Define δVt = rt + γV(st+1) − V(st), i.e., the TD residual of V with discount γ [SB98]. Note that δVt can be considered as an estimate of the advantage of the action at. In fact, if we have the correct value function V = Vπ,γ, then it is a γ-just advantage estimator, and in fact, an unbiased estimator of Aπ,γ: E δVπ,γ = E [r +γVπ,γ(s )−Vπ,γ(s )] st+1 t st+1 t t+1 t = Est+1 [Qπ,γ(st, at) − Vπ,γ(st)] = Aπ,γ(st, at). However, this estimator is only γ-just for V = Vπ,γ, otherwise it will yield biased policy gradient estimates. Next, let us consider taking the sum of k of these δ terms, which we will denote by Aˆ ( k ) t Aˆ(1):=δV t t Aˆ(2):=δV+γδV t t t+1 =−V(s)+r+γV(s ) t t t+1 Aˆ(3):=δV+γδV t t t+1 t+2 k−1 ˆ(k) lV k gˆ=N n=1 t=0 =−V(s)+r +γr t t t+1 +γ2V(s ) t+2 4.3 advantage function estimation 49 +γ2δV γ δt = −V(st)+rt +γrt+1 +···+γ rt+k−1 +γ V(st+k) =−V(s)+r +γr t t t+1 t+2 At := These equations result from a telescoping sum, and we see that Aˆ (k) involves a k-step l=0 t estimate of the returns, minus a baseline term −V(st). Analogously to the case of δVt = Aˆ (1), we can consider Aˆ (k) to be an estimator of the advantage function, which is only t π,γ t γ-just when V = V . However, note that the bias generally becomes smaller as k → ∞, +γ2r k−1 +γ3V(s ) t+3PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)