OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 084

5.7 conclusion 76 [Pea14]. (In fact, a stochastic computation graph is a type of Bayes network; where the deterministic nodes correspond to degenerate probability distributions.) The topic of gradient estimation has drawn significant recent interest in machine learning. Gradients for networks with stochastic units was investigated in Bengio et al. [BLC13], though they are concerned with differentiating through individual units and layers; not how to deal with arbitrarily structured models and loss functions. Kingma and Welling [KW14] consider a similar framework, although only with continuous latent variables, and point out that reparameterization can be used to to convert hierarchical Bayesian models into neural networks, which can then be trained by backpropagation. The score function method is used to perform variational inference in general mod- els (in the context of probabilistic programming) in Wingate and Weber [WW13], and similarly in Ranganath et al. [RGB13]; both papers mostly focus on mean-field approxi- mations without amortized inference. It is used to train generative models using neural networks with discrete stochastic units in Mnih and Gregor [MG14] and Gregor et al. in [Gre+13]; both amortize inference by using an inference network. Generative models with continuous valued latent variables networks are trained (again using an inference network) with the reparametrization method by Rezende, Mohamed, and Wierstra [RMW14] and by Kingma and Welling [KW13]. Rezende et al. also pro- vide a detailed discussion of reparameterization, including a discussion comparing the variance of the SF and PD estimators. Bengio, Leonard, and Courville [BLC13] have recently written a paper about gradi- ent estimation in neural networks with stochastic units or non-differentiable activation functions—including Monte Carlo estimators and heuristic approximations. The notion that policy gradients can be computed in multiple ways was pointed out in early work on policy gradients by Williams [Wil92]. However, all of this prior work deals with spe- cific structures of the stochastic computation graph and does not address the general case. 5.7 conclusion The reinforcement learning is extremely general and lies at the heart of artificial intelli- gence, and corresponds to the ability for decision making and motor control. The core idea in deep learning is that by reducing learning into optimization, it is possible to learn function approximators that perform computation. We have developed a framework for describing a computation with stochastic and deterministic operations, called a stochastic computation graph. Given a stochastic com-

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)