PDF Publication Title:
Text from PDF Page: 074
we can use the score function (SF) estimator [Fu06]: ∂∂ ∂θ Ex [f(x)] = Ex f(x) ∂θ log p(x; θ) . (36) This classic equation is derived as follows: ∂∂∂ ∂θEx [f(x)] = ∂θ dx p(x; θ)f(x) = dx ∂θp(x; θ)f(x) ∂∂ = dx p(x; θ)∂θ log p(x; θ)f(x) = Ex f(x)∂θ log p(x; θ) . (37) This equation is valid if and only if p(x; θ) is a continuous function of θ; however, it does not need to be a continuous function of x [Gla03]. • x may be a deterministic, differentiable function of θ and another random variable z, i.e., we can write x(z, θ). Then, we can use the pathwise derivative (PD) estimator, defined as follows. ∂∂ ∂θ Ez [f(x(z, θ))] = Ez ∂θ f(x(z, θ)) . This equation, which merely swaps the derivative and expectation, is valid if and only if f(x(z, θ)) is a continuous function of θ for all z [Gla03]. 1 That is not true if, for example, f is a step function. • Finally θ might appear both in the probability distribution and inside the expecta- tion, e.g., in ∂ Ez∼p(·; θ) [f(x(z, θ))]. Then the gradient estimator has two terms: ∂θ ∂∂∂ ∂θ Ez∼p(·; θ) [f(x(z, θ))] = Ez∼p(·; θ) ∂θ f(x(z, θ)) + ∂θ log p(z; θ) f(x(z, θ)) . This formula can be derived by writing the expectation as an integral and differen- tiating, as in Equation (37). In some cases, it is possible to reparameterize a probabilistic model—moving θ from the distribution to inside the expectation or vice versa. See [Fu06] for a general discussion, and see [KW13; RMW14] for a recent application of this idea to variational inference. The SF and PD estimators are applicable in different scenarios and have different properties. 1 Note that for the pathwise derivative estimator, f(x(z, θ)) merely needs to be a continuous function of θ—it is sufficient that this function is almost-everywhere differentiable. A similar statement can be made about p(x; θ) and the score function estimator. See Glasserman [Gla03] for a detailed discussion of the technical requirements for these gradient estimators to be valid. 5.2 preliminaries 66PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)