PDF Publication Title:
Text from PDF Page: 102
[TZS04] [Tes95] [Tho14] [TET12] [VHGS15] [VR+97] [Vla+09] [WP09] [Waw09] [Wie+08] [Wie+10] [Wil92] [WW13] Bibliography 94 R. Tedrake, T. Zhang, and H. Seung. “Stochastic policy gradient reinforcement learn- ing on a simple 3D biped.” In: IEEE/RSJ International Conference on Intelligent Robots and Systems. 2004 (cit. on p. 32). G. Tesauro. “Temporal difference learning and TD-Gammon.” In: Communications of the ACM 38.3 (1995), pp. 58–68 (cit. on p. 2). P. Thomas. “Bias in natural actor-critic algorithms.” In: Proceedings of The 31st Inter- national Conference on Machine Learning. 2014, pp. 441–448 (cit. on p. 47). E. Todorov, T. Erez, and Y. Tassa. “MuJoCo: A physics engine for model-based con- trol.” In: Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE. 2012, pp. 5026–5033 (cit. on pp. 30, 57). H. Van Hasselt, A. Guez, and D. Silver. “Deep reinforcement learning with double Q-learning.” In: CoRR, abs/1509.06461 (2015) (cit. on p. 3). B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis. “A neuro-dynamic program- ming approach to retailer inventory management.” In: Decision and Control, 1997., Proceedings of the 36th IEEE Conference on. Vol. 4. IEEE. 1997, pp. 4052–4057 (cit. on p. 1). N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. “Learning model-free robot control by a Monte Carlo EM algorithm.” In: Autonomous Robots 27.2 (2009), pp. 123– 130 (cit. on p. 79). K. Wampler and Z. Popovic ́. “Optimal gait and form for animal locomotion.” In: ACM Transactions on Graphics (TOG). Vol. 28. 3. ACM. 2009, p. 60 (cit. on pp. 4, 32). P. Wawrzyn ́ski. “Real-time reinforcement learning by sequential actor–critics and experience replay.” In: Neural Networks 22.10 (2009), pp. 1484–1497 (cit. on pp. 45, 46). D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber. “Natural evolution strategies.” In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computa- tional Intelligence). IEEE. 2008, pp. 3381–3387 (cit. on p. 4). D. Wierstra, A. Förster, J. Peters, and J. Schmidhuber. “Recurrent policy gradients.” In: Logic Journal of IGPL 18.5 (2010), pp. 620–634 (cit. on pp. 64, 83). R. J. Williams. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.” In: Machine learning 8.3-4 (1992), pp. 229–256 (cit. on pp. 4, 16, 64, 67, 76, 81, 82). D. Wingate and T. Weber. “Automated variational inference in probabilistic program- ming.” In: arXiv preprint arXiv:1301.1299 (2013) (cit. on p. 76).PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS
PDF Search Title:
OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHSOriginal File Name Searched:
thesis-optimizing-deep-learning.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)