OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 067

4.7 discussion 59 0.0 0.5 1.0 1.5 2.0 2.5 3D Biped γ =0.96,λ =0.96 γ =0.98,λ =0.96 γ =0.99,λ =0.96 γ =0.995,λ =0.92 γ =0.995,λ =0.96 γ =0.995,λ =0.98 γ =0.995,λ =0.99 γ =0.995,λ =1.0 γ =1,λ =0.96 γ =1, No value fn 0 100 number of policy iterations 200 300 400 500 2 0 2 4 6 8 10 12 3D Quadruped 0 200 400 number of policy iterations γ =0.995, No value fn γ =0.995,λ =1 γ =0.995,λ =0.9 6 600 800 1000 Figure 9: Left: Learning curves for 3D bipedal locomotion, averaged across nine runs of the al- gorithm. Right: learning curves for 3D quadrupedal locomotion, averaged across five runs. that this algorithm could be run on a real robot, or multiple real robots learning in par- allel, if there were a way to reset the state of the robot and ensure that it doesn’t damage itself. Other 3D robot tasks The other two motor behaviors considered are quadrupedal locomotion and getting up off the ground for the 3D biped. Again, we performed 5 trials per experimental con- dition, with different random seeds (and initializations). The experiments took about 4 hours per trial on a 32-core machine. We performed a more limited comparison on these domains (due to the substantial computational resources required to run these experi- ments), fixing γ = 0.995 but varying λ = {0, 0.96}, as well as an experimental condition with no value function. For quadrupedal locomotion, the best results are obtained using a value function with λ = 0.96 Section 4.6.3. For 3D standing, the value function always helped, but the results are roughly the same for λ = 0.96 and λ = 1. 4.7 discussion Policy gradient methods provide a way to reduce reinforcement learning to stochastic gradient descent, by providing unbiased gradient estimates. However, so far their success at solving difficult control problems has been limited, largely due to their high sample cost cost

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)