
PDF Publication Title:
Text from PDF Page: 096
[Bha+09] [Dah+12] [DILM09] [DNP13] [DR11] [Dua+16] [Fu06] [GGS13] [GPW06] [Gla03] [Gly90] [GBB04] [Gre+13] Bibliography 88 S. Bhatnagar, D. Precup, D. Silver, R. S. Sutton, H. R. Maei, and C. Szepesvári. “Convergent temporal-difference learning with arbitrary smooth function approxi- mation.” In: Advances in Neural Information Processing Systems. 2009, pp. 1204–1212 (cit. on p. 60). G. E. Dahl, D. Yu, L. Deng, and A. Acero. “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition.” In: IEEE Transactions on Audio, Speech, and Language Processing 20.1 (2012), pp. 30–42 (cit. on pp. 1, 2). H. Daumé Iii, J. Langford, and D. Marcu. “Search-based structured prediction.” In: Machine learning 75.3 (2009), pp. 297–325 (cit. on p. 1). M. Deisenroth, G. Neumann, and J. Peters. “A Survey on Policy Search for Robotics.” In: Foundations and Trends in Robotics 2.1-2 (2013), pp. 1–142 (cit. on p. 18). M. Deisenroth and C. E. Rasmussen. “PILCO: A model-based and data-efficient ap- proach to policy search.” In: Proceedings of the 28th International Conference on machine learning (ICML-11). 2011, pp. 465–472 (cit. on p. 86). Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel. “Benchmarking Deep Reinforcement Learning for Continuous Control.” In: arXiv preprint arXiv:1604.06778 (2016) (cit. on p. 84). M. C. Fu. “Gradient estimation.” In: Handbooks in operations research and management science 13 (2006), pp. 575–616 (cit. on pp. 66, 67, 74). V. Gabillon, M. Ghavamzadeh, and B. Scherrer. “Approximate Dynamic Program- ming Finally Performs Well in the Game of Tetris.” In: Advances in Neural Information Processing Systems. 2013 (cit. on p. 25). T. Geng, B. Porr, and F. Wörgötter. “Fast biped walking with a reflexive controller and realtime policy searching.” In: Advances in Neural Information Processing Systems (NIPS). 2006 (cit. on p. 32). P. Glasserman. Monte Carlo methods in financial engineering. Vol. 53. Springer Science & Business Media, 2003 (cit. on pp. 66, 67). P. W. Glynn. “Likelihood ratio gradient estimation for stochastic systems.” In: Com- munications of the ACM 33.10 (1990), pp. 75–84 (cit. on pp. 64, 67). E. Greensmith, P. L. Bartlett, and J. Baxter. “Variance reduction techniques for gradi- ent estimates in reinforcement learning.” In: The Journal of Machine Learning Research 5 (2004), pp. 1471–1530 (cit. on pp. 14, 47, 74). K. Gregor, I. Danihelka, A. Mnih, C. Blundell, and D. Wierstra. “Deep autoregressive networks.” In: arXiv preprint arXiv:1310.8499 (2013) (cit. on p. 76).PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
| CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |