OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 100

[Mun06] [NP90] [Nea90] [NH98] [NJ00] [NHR99] [Osb+16] [Owe13] [PR98] [PB13] [Pea14] [PMA10] [PS08] [Pir+13] [Pol00] Bibliography 92 R. Munos. “Policy gradient in continuous time.” In: The Journal of Machine Learning Research 7 (2006), pp. 771–791 (cit. on p. 67). K. S. Narendra and K. Parthasarathy. “Identification and control of dynamical sys- tems using neural networks.” In: IEEE Transactions on neural networks 1.1 (1990), pp. 4– 27 (cit. on p. 2). R. M. Neal. “Learning stochastic feedforward networks.” In: Department of Computer Science, University of Toronto (1990) (cit. on p. 64). R. M. Neal and G. E. Hinton. “A view of the EM algorithm that justifies incremental, sparse, and other variants.” In: Learning in graphical models. Springer, 1998, pp. 355– 368 (cit. on pp. 64, 79). A. Y. Ng and M. Jordan. “PEGASUS: A policy search method for large MDPs and POMDPs.” In: Uncertainty in artificial intelligence (UAI). 2000 (cit. on p. 26). A. Y. Ng, D. Harada, and S. Russell. “Policy invariance under reward transforma- tions: Theory and application to reward shaping.” In: ICML. Vol. 99. 1999, pp. 278– 287 (cit. on pp. 45, 51, 52). I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. “Deep Exploration via Boot- strapped DQN.” In: arXiv preprint arXiv:1602.04621 (2016) (cit. on pp. 3, 86). A. B. Owen. Monte Carlo theory, methods and examples. 2013 (cit. on p. 26). R. Parr and S. Russell. “Reinforcement learning with hierarchies of machines.” In: Advances in neural information processing systems (1998), pp. 1043–1049 (cit. on p. 85). R. Pascanu and Y. Bengio. “Revisiting natural gradient for deep networks.” In: arXiv preprint arXiv:1301.3584 (2013). arXiv: 1301.3584 [cs.DG] (cit. on p. 40). J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Mor- gan Kaufmann, 2014 (cit. on p. 76). J. Peters, K. Mülling, and Y. Altün. “Relative Entropy Policy Search.” In: AAAI Con- ference on Artificial Intelligence. 2010 (cit. on pp. 24, 29). J. Peters and S. Schaal. “Natural actor-critic.” In: Neurocomputing 71.7 (2008), pp. 1180– 1190 (cit. on pp. 6, 24, 27, 55, 61). M. Pirotta, M. Restelli, A. Pecorino, and D. Calandriello. “Safe policy iteration.” In: Proceedings of The 30th International Conference on Machine Learning. 2013, pp. 307–315 (cit. on p. 29). D. Pollard. Asymptopia: an exposition of statistical asymptotic theory. 2000. url: http: //www.stat.yale.edu/~pollard/Books/Asymptopia (cit. on p. 22).

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--100

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP