PDF Publication Title:
Text from PDF Page: 079
5.3 main results on stochastic computation graphs 71 For the results that follow, we need to define the notion of “influence”, for which we will introduce two relations ≺ and ≺D. The relation v ≺ w (“v influences w”) means that there exists a sequence of nodes a1, a2, . . . , aK, with K 0, such that (v,a1),(a1,a2),...,(aK−1,aK),(aK,w)areedgesinthegraph.Therelationv≺D w(“v deterministically influences w”) is defined similarly, except that now we require that each ak is a deterministic node. For example, in Figure 11, diagram (5) above, θ influences {x1, x2, f1, f2}, but it only deterministically influences {x1, x2}. Next, we will establish a condition that is sufficient for the existence of the gradient. Namely, we will stipulate that every edge (v,w) with w lying in the “influenced” set of θ corresponds to a differentiable dependency: if w is deterministic, then the Jacobian ∂w must exist; if w is stochastic, then the probability mass function p(w | v, . . . ) must be ∂v differentiable with respect to v. More formally: Condition 1 (Differentiability Requirements). Given input node θ ∈ Θ, for all edges (v,w) which satisfy θ ≺D v and θ ≺D w, then the following condition holds: if w is deterministic, Jacobian ∂w exists, and if w is stochastic, then the derivative of the probability ∂v mass function ∂ p(w | parentsw) exists. ∂v Note that Condition 1 does not require that all the functions in the graph are differ- entiable. If the path from an input θ to deterministic node v is blocked by stochastic nodes, then v may be a nondifferentiable function of its parents. If a path from input θ to stochastic node v is blocked by other stochastic nodes, the likelihood of v given its parents need not be differentiable; in fact, it does not need to be known2. We need a few more definitions to state the main theorems. Let depsv := {w ∈ Θ ∪ S | w ≺D v}, the “dependencies” of node v, i.e., the set of nodes that deterministically influence it. Note the following: • If v ∈ S, the probability mass function of v is a function of depsv, i.e., we can write p(v | depsv). • If v ∈ D, v is a deterministic function of depsv, so we can write v(depsv). ˆ Let Qv := c≻v, cˆ, i.e., the sum of costs downstream of node v. These costs will be c∈C treated as constant, fixed to the values obtained during sampling. In general, we will use the hat symbol vˆ to denote a sample value of variable v, which will be treated as constant in the gradient formulae. 2 This fact is particularly important for reinforcement learning, allowing us to compute policy gradient estimates despite having a discontinuous dynamics function or reward function.PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)