OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 049

3.12 efficiently solving the trust-region constrained optimization problem 41 which parameterizes the distribution π(u | x). (For example, for a Gaussian distribution, μ could be the mean and standard deviation concatenated; for a categorical distribution, it could be the vector of probabilities or log-probabilities.) Now the KL divergence for a given input x can be written as follows: DKL(πθold (· | x) ∥ πθ(· | x)) = kl(μθ(x), μold(x)) where kl is the KL divergence between the distributions corresponding to the two mean parameter vectors. Let us assume we can compute kl analytically in terms of its argu- ments. Differentiating kl twice with respect to θ, we obtain ∂μa(x) kl′′ (μ (x),μ (x))∂μb(x) + ∂2μa(x) kl′ (μ (x),μ (x)) (21) ∂θi ab θ old ∂θj ∂θi∂θj a θ old 􏳃 􏳂􏳁 􏳄􏳃 􏳂􏳁 􏳄 JT MJ =0 at μθ=μold where the primes (′) indicate differentiation with respect to the first argument, and there is an implied summation over indices a,b. The second term vanishes because the KL divergence is minimized at μθ = μold, and the derivative is zero at a minimum. Let J := ∂μa(x) (the Jacobian), then the Fisher information matrix can be written in matrix form as ∂θi JTMJ, where M = kl′′ (μθ(x),μold) is the Fisher information matrix of the distribution ab in terms of the mean parameter μ (as opposed to the parameter θ). M has a simple form for most parameterized distributions of interest. The Fisher-vector product can now be written as a function y → JT MJy. Multiplica- tion by JT and J can be performed by automatic differentiation software such as Theano [Ber+10], and the matrix M (the Fisher matrix with respect to μ) can be computed ana- lytically for the distribution of interest. Note that multiplication by JT is the well-known backpropagation operation, whereas multiplication by J is tangent-propagation [Gri+89] or the R-Op (in Theano). There is a simpler but (slightly) less efficient way to calculate the Fisher-vector prod- ucts using only reverse mode automatic differentiation. This technique is described in [WN99], chapter 8. Let f(θ) = kl(μθ(x),μold), then we want to compute the Hessian- vector product Hy, where y is a vector, and H is the Hessian of f(θ). We can first form the expression for the gradient-vector product ∇θf(θ) · p, then we differentiate this ex- pression to get the Hessian-vector product. This method is slightly less efficient than the one above as it does not exploit the fact that the second derivatives of μ(x) (i.e., the sec- ond term in Equation (21)) can be ignored, but may be substantially easier to implement. We have described a procedure for computing the Fisher-vector product y → Ay, where the Fisher information matrix is averaged over a set of inputs to the function μ.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)