logo

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

PDF Publication Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS ( optimizing-expectations-from-deep-reinforcement-learning-to- )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 093

non-RL problems. 6.1 frontiers Many open problems remain, which relate to and could build on this thsis work. Below, we describe some of the frontiers that we consider to be the most exciting, mostly in the field of deep reinforcement learning. 1. Shared representations for control and prediction. In domains with high-dimensional observations (for example, robotics using camera input, or games like Atari), two different mappings need to be learned: first, we need to map the raw input into more useful representations (for example, parse the image into a set of objects and their locations); second, we need to map these representations to the actions. When using policy gradient methods, this learning is driven by the advantage function, which is a noisy one-dimensional signal—i.e., it is a slow source of information about the environment. It should be possible to learn representations faster by solving prediction problems involving the observations themselves—that way, we are using much more information from the environment. To speed up learning this way, we would need to use an architecture that shares parameters between a prediction part and an action-selection part. 2. Hierarchy: animals and (prospectively) intelligent robots need to carry out behaviors that unfold over a range of different timescales: fractions of a second for low-level motor control; hours or days for various high-level behaviors. But traditional rein- forcement learning methods have fundamental difficulties learning any behaviors that require more than 100 − 1000 timesteps. Learning can proceed if the MDP is augmented with high-level actions that unfold over a long period of time: some versions of this idea include hierarchical abstract machines [PR98] and options [SPS99]. The persistent difficulty is how to automatically learn these high-level actions, or what kind of optimization objective will encourage the policy to be more “hierar- chical”. 3. Exploration: the principle of exploration is to actively encourage the agent to reach unfamiliar parts of state space, avoiding convergence to a suboptimal policy. Policy gradient methods are prone to converging to suboptimal policies, as we observed many times while doing the empirical work in this thesis. While a body of theoret- ical work answers the question of how to explore optimally in an finite MDP (e.g., [Str+06]), there is a need for exploration methods that can be applied in challeng- ing real-world settings such as robotics. Some preliminary work towards making 6.1 frontiers 85

PDF Image | OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

optimizing-expectations-from-deep-reinforcement-learning-to--093

PDF Search Title:

OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS

Original File Name Searched:

thesis-optimizing-deep-learning.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP