OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 079

5.3 main results on stochastic computation graphs 71 For the results that follow, we need to define the notion of “influence”, for which we will introduce two relations ≺ and ≺D. The relation v ≺ w (“v influences w”) means that there exists a sequence of nodes a1, a2, . . . , aK, with K 􏳇 0, such that (v,a1),(a1,a2),...,(aK−1,aK),(aK,w)areedgesinthegraph.Therelationv≺D w(“v deterministically influences w”) is defined similarly, except that now we require that each ak is a deterministic node. For example, in Figure 11, diagram (5) above, θ influences {x1, x2, f1, f2}, but it only deterministically influences {x1, x2}. Next, we will establish a condition that is sufficient for the existence of the gradient. Namely, we will stipulate that every edge (v,w) with w lying in the “influenced” set of θ corresponds to a differentiable dependency: if w is deterministic, then the Jacobian ∂w must exist; if w is stochastic, then the probability mass function p(w | v, . . . ) must be ∂v differentiable with respect to v. More formally: Condition 1 (Differentiability Requirements). Given input node θ ∈ Θ, for all edges (v,w) which satisfy θ ≺D v and θ ≺D w, then the following condition holds: if w is deterministic, Jacobian ∂w exists, and if w is stochastic, then the derivative of the probability ∂v mass function ∂ p(w | parentsw) exists. ∂v Note that Condition 1 does not require that all the functions in the graph are differ- entiable. If the path from an input θ to deterministic node v is blocked by stochastic nodes, then v may be a nondifferentiable function of its parents. If a path from input θ to stochastic node v is blocked by other stochastic nodes, the likelihood of v given its parents need not be differentiable; in fact, it does not need to be known2. We need a few more definitions to state the main theorems. Let depsv := {w ∈ Θ ∪ S | w ≺D v}, the “dependencies” of node v, i.e., the set of nodes that deterministically influence it. Note the following: • If v ∈ S, the probability mass function of v is a function of depsv, i.e., we can write p(v | depsv). • If v ∈ D, v is a deterministic function of depsv, so we can write v(depsv). ˆ􏰊 Let Qv := c≻v, cˆ, i.e., the sum of costs downstream of node v. These costs will be c∈C treated as constant, fixed to the values obtained during sampling. In general, we will use the hat symbol vˆ to denote a sample value of variable v, which will be treated as constant in the gradient formulae. 2 This fact is particularly important for reinforcement learning, allowing us to compute policy gradient estimates despite having a discontinuous dynamics function or reward function.

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)