OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 102

[TZS04] [Tes95] [Tho14] [TET12] [VHGS15] [VR+97] [Vla+09] [WP09] [Waw09] [Wie+08] [Wie+10] [Wil92] [WW13] Bibliography 94 R. Tedrake, T. Zhang, and H. Seung. “Stochastic policy gradient reinforcement learn- ing on a simple 3D biped.” In: IEEE/RSJ International Conference on Intelligent Robots and Systems. 2004 (cit. on p. 32). G. Tesauro. “Temporal difference learning and TD-Gammon.” In: Communications of the ACM 38.3 (1995), pp. 58–68 (cit. on p. 2). P. Thomas. “Bias in natural actor-critic algorithms.” In: Proceedings of The 31st Inter- national Conference on Machine Learning. 2014, pp. 441–448 (cit. on p. 47). E. Todorov, T. Erez, and Y. Tassa. “MuJoCo: A physics engine for model-based con- trol.” In: Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE. 2012, pp. 5026–5033 (cit. on pp. 30, 57). H. Van Hasselt, A. Guez, and D. Silver. “Deep reinforcement learning with double Q-learning.” In: CoRR, abs/1509.06461 (2015) (cit. on p. 3). B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis. “A neuro-dynamic program- ming approach to retailer inventory management.” In: Decision and Control, 1997., Proceedings of the 36th IEEE Conference on. Vol. 4. IEEE. 1997, pp. 4052–4057 (cit. on p. 1). N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. “Learning model-free robot control by a Monte Carlo EM algorithm.” In: Autonomous Robots 27.2 (2009), pp. 123– 130 (cit. on p. 79). K. Wampler and Z. Popovic ́. “Optimal gait and form for animal locomotion.” In: ACM Transactions on Graphics (TOG). Vol. 28. 3. ACM. 2009, p. 60 (cit. on pp. 4, 32). P. Wawrzyn ́ski. “Real-time reinforcement learning by sequential actor–critics and experience replay.” In: Neural Networks 22.10 (2009), pp. 1484–1497 (cit. on pp. 45, 46). D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber. “Natural evolution strategies.” In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computa- tional Intelligence). IEEE. 2008, pp. 3381–3387 (cit. on p. 4). D. Wierstra, A. Förster, J. Peters, and J. Schmidhuber. “Recurrent policy gradients.” In: Logic Journal of IGPL 18.5 (2010), pp. 620–634 (cit. on pp. 64, 83). R. J. Williams. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.” In: Machine learning 8.3-4 (1992), pp. 229–256 (cit. on pp. 4, 16, 64, 67, 76, 81, 82). D. Wingate and T. Weber. “Automated variational inference in probabilistic program- ming.” In: arXiv preprint arXiv:1301.1299 (2013) (cit. on p. 76).

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)