OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 078

5.3 main results on stochastic computation graphs 70 terministic computation graph representing classification loss for a two-layer neural net- work, which has four parameters (W1,b1,W2,b2) (weights and biases). Of course, this deterministic computation graph is a special type of stochastic computation graph. 5.3 main results on stochastic computation graphs 5.3.1 Gradient Estimators This section will consider a general stochastic computation graph, in which a certain set of nodes are designated as costs, and we would like to compute the gradient of the sum of costs with respect to some input node θ. In brief, the main results of this section are as follows: 1. We derive a gradient estimator for an ex- pected sum of costs in a stochastic compu- tation graph. This estimator contains two parts (1) a score function part, which is a sum of terms grad log-prob of variable × sum of costs influenced by variable; and (2) a path- wise derivative term, that propagates the dependence through differentiable func- tions. 2. This gradient estimator can be computed efficiently by differentiating an appropri- ate “surrogate” objective function. Let Θ denote the set of input nodes, D the set of deterministic nodes, and S the set of stochastic nodes. Further, we will designate a set of cost nodes C, which are scalar-valued and deterministic. (Note that there is no loss of generality in assuming that the costs are deterministic—if a cost is stochastic, we can simply append a deterministic node that applies the identity function to it.) We will use θ to denote an input node (θ ∈ Θ) that we differentiate with respect to. In the context of machine learning, we will usually be most concerned with differentiating with respect to a parameter vector (or tensor), however, the theory we present does not make any assumptions about what θ represents. Notation Glossary Θ: Input nodes D: Deterministic nodes S: Stochastic nodes C: Cost nodes v ≺ w: v influences w v ≺D w: v deterministically influences w depsv: “dependencies”, {w∈Θ∪S|w≺D v} Qˆ v: sum of cost nodes influenced by v. vˆ: denotes the sampled value of the node v.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--078

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP