PDF Publication Title:
Text from PDF Page: 100
[Mun06] [NP90] [Nea90] [NH98] [NJ00] [NHR99] [Osb+16] [Owe13] [PR98] [PB13] [Pea14] [PMA10] [PS08] [Pir+13] [Pol00] Bibliography 92 R. Munos. “Policy gradient in continuous time.” In: The Journal of Machine Learning Research 7 (2006), pp. 771–791 (cit. on p. 67). K. S. Narendra and K. Parthasarathy. “Identification and control of dynamical sys- tems using neural networks.” In: IEEE Transactions on neural networks 1.1 (1990), pp. 4– 27 (cit. on p. 2). R. M. Neal. “Learning stochastic feedforward networks.” In: Department of Computer Science, University of Toronto (1990) (cit. on p. 64). R. M. Neal and G. E. Hinton. “A view of the EM algorithm that justifies incremental, sparse, and other variants.” In: Learning in graphical models. Springer, 1998, pp. 355– 368 (cit. on pp. 64, 79). A. Y. Ng and M. Jordan. “PEGASUS: A policy search method for large MDPs and POMDPs.” In: Uncertainty in artificial intelligence (UAI). 2000 (cit. on p. 26). A. Y. Ng, D. Harada, and S. Russell. “Policy invariance under reward transforma- tions: Theory and application to reward shaping.” In: ICML. Vol. 99. 1999, pp. 278– 287 (cit. on pp. 45, 51, 52). I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. “Deep Exploration via Boot- strapped DQN.” In: arXiv preprint arXiv:1602.04621 (2016) (cit. on pp. 3, 86). A. B. Owen. Monte Carlo theory, methods and examples. 2013 (cit. on p. 26). R. Parr and S. Russell. “Reinforcement learning with hierarchies of machines.” In: Advances in neural information processing systems (1998), pp. 1043–1049 (cit. on p. 85). R. Pascanu and Y. Bengio. “Revisiting natural gradient for deep networks.” In: arXiv preprint arXiv:1301.3584 (2013). arXiv: 1301.3584 [cs.DG] (cit. on p. 40). J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Mor- gan Kaufmann, 2014 (cit. on p. 76). J. Peters, K. Mülling, and Y. Altün. “Relative Entropy Policy Search.” In: AAAI Con- ference on Artificial Intelligence. 2010 (cit. on pp. 24, 29). J. Peters and S. Schaal. “Natural actor-critic.” In: Neurocomputing 71.7 (2008), pp. 1180– 1190 (cit. on pp. 6, 24, 27, 55, 61). M. Pirotta, M. Restelli, A. Pecorino, and D. Calandriello. “Safe policy iteration.” In: Proceedings of The 30th International Conference on Machine Learning. 2013, pp. 307–315 (cit. on p. 29). D. Pollard. Asymptopia: an exposition of statistical asymptotic theory. 2000. url: http: //www.stat.yale.edu/~pollard/Books/Asymptopia (cit. on p. 22).PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)