PDF Publication Title:
Text from PDF Page: 027
3.2 preliminaries Consider an infinite-horizon discounted Markov decision process (MDP), defined by the tuple (S,A,P,r,ρ0,γ), where S is a finite set of states, A is a finite set of actions, P : S×A×S → R is the transition probability distribution, r : S → R is the reward function, ρ0 : S → R is the distribution of the initial state s0, and γ ∈ (0,1) is the discount factor. Note that this setup differs from the Chapter 2 due to the discount, which is necessary for the theoretical analysis. Let π denote a stochastic policy π : S × A → [0, 1], and let η(π) denote its expected discounted reward: t=0 ∞ γtr(st) , where η(π) = Es0,a0,... s0 ∼ ρ0(s0), at ∼ π(at | st), st+1 ∼ P(st+1 | st, at). We will use the following standard definitions of the state-action value function Qπ, the value function Vπ, and the advantage function Aπ: Qπ(st, at) = Est+1,at+1,... ∞ γlr(st+l) , γlr(st+l) , Aπ(s,a)= Qπ(s,a)−Vπ(s), where l=0 ∞ Vπ(st) = Eat,st+1,... at ∼ π(at |st),st+1 ∼ P(st+1 |st,at) for t 0. l=0 The following useful identity expresses the expected return of another policy π ̃ in terms of the advantage over π, accumulated over timesteps (see Kakade and Langford [KL02] or Appendix 3.10 for proof): t=0 be the (unnormalized) discounted visitation frequencies ρπ(s)=P(s0 = s)+γP(s1 = s)+γ2P(s2 = s)+..., η(π ̃) = η(π) + Es0,a0,···∼π ̃ where the notation Es0,a0,···∼π ̃ [. . . ] indicates that actions are sampled at ∼ π ̃(· | st). Let ρπ ∞ γtAπ(st, at) (3) 3.2 preliminaries 19PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |