On the Powerball Method for Optimization

PDF Publication Title:

On the Powerball Method for Optimization ( on-powerball-method-optimization )

Text from PDF Page: 001

Abstract On the Powerball Method for Optimization Ye Yuana, Mu Lib, Jun Liuc, Claire Tomlind aSchool of Automation, Huazhong University of Science and Technology bDepartment of Computer Science, Carnegie Mellon University cDepartment of Applied Mathematics, University of Waterloo dDepartment of Electrical Engineering and Computer Sciences, University of California, Berkeley We propose a new method to accelerate the convergence of optimization algorithms. This method simply adds a power coefficient γ ∈ [0, 1) to the gradient during optimization. We call this the Powerball method and analyze the convergence rate for the Powerball method for strongly convex functions. While theoretically the Powerball method is guaranteed to have a linear convergence rate in the same order of the gradient method, we show that empirically it significantly outperforms the gradient descent and Newton’s method, especially during the initial iterations. We demonstrate that the Powerball method provides a 10-fold speedup of the convergence of both gradient descent and L-BFGS on multiple real datasets. Keywords: Optimizationalgorithms,Convexprogramming,Dynamicalsystems,Lyapunovfunction,Convergenceanalysis 1. Introduction We consider minimizing a differentiable function f (x) : Rn → R with iterative methods. Given a starting point x(0) ∈ Rn, these methods compute x(k + 1) = x(k) − A−1∇ f (x(k)), for k = 0, 1, . . . . (1.1) k Previous work has focused mainly on the choice of Ak. One choice is using a scalar step size Ak = α−1 with αk > 0, yielding the gradient descent method due to Cauchy. Another widely adopted choice of Ak is the Hessian matrix ∇2 f (x(k)), which is used by the notable Newton’s method. In this paper, we propose the Powerball method, which applies a nonlinear element-wise transformation to the gradient by x(k+1)=x(k)−A−1σγ(∇f(x(k))), fork=0,1,.... (1.2) k For any vector x = (x1, . . . , xn)T , the Powerball function σγ is ap- plied to all elements of x, that is σγ(x) = (σγ(x1), . . . , σγ(xn))T . For simplicity, we drop the subscript γ and use σ(x) to denote σγ(x). The Powerball function σ(·) : R → R has the form σ(z) = sign(z)|z|γ for γ ∈ (0, 1), where sign(z) returns the sign ofz,or0ifz=0. Weuseaconstantpowercoefficientγfor method in Section 4. Moreover, we demonstrate the fast conver- gence of Powerball algorithms on a classification problem with benchmark datasets in Section 5. Finally, we conclude this paper with a general discussion on applying insights from control and dynamical systems to optimization algorithms. 2. Intuition from Ordinary Differential Equations Consider the algorithms presented in eq. (1.1) and eq. (1.2). If the index, or iteration number, of these algorithms is viewed as a discrete-time index, then these algorithms can be viewed as discrete-time dynamical systems. By taking this view, the convergence of an optimization method to a minimizer can be equivalently seen as the convergence of a dynamical system to an equilibrium (Bhaya and Kaszkurewicz, 2006). The intuition of the gradient Powerball algorithm lies in the Euler discretization of the following ODE: x ̇ = −σ(∇ f (x)). (2.1) Definition 1. A function f is strongly convex with coefficient m>0,ifitsatisfies f(y)≥ f(x)+∇f(x)·(y−x)+m∥y−x∥2 for n2 all iterations. Similarly, we call the method with Ak = α−1 in k We prove a convergence result for the above ODE when γ ∈ (0, 1) under the assumption that f is strongly convex. The proof is given in Appendix A. Proposition 1. For any strongly convex function f with coef- ficient m, the solutions of the ordinary differential equation 􏰆 eq. (1.2) the gradient Powerball method and the method with Ak = ∇2 f (x(k)) the Newton Powerball method. We will also pro- pose other Powerball variants of standard methods throughout the paper, for example, the L-BFGS Powerball method. (2.1) for γ ∈ (0, 1) converge to its equilibrium in finite time This paper is organized as follows. In Section 2, we shall k 􏰆γ+1 equations (ODE). Furthermore, we analyze the convergence rate Similarly, the intuition of the Newton Powerball method lies 1−γ provide intuition behind the Powerball method by viewing op- T = ((γ+1)V(0))1+γ , in which V(t) 􏰁 1 􏰊ni=1 􏰆􏰆􏰆∂f(x(t))􏰆􏰆􏰆 . timization algorithms as discretizations of ordinary differential m(1−γ) γ+1 allx,y∈R. ∂xi in the Euler discretization of the following ODE: x ̇ = − 􏰈∇2 f (x)􏰉−1 σ(∇ f (x)). June 16, 2017 for the proposed Powerball method for strongly convex func- tions in Section 3 and discuss important variants of Powerball Preprint submitted to Automatica arXiv:1603.07421v4 [cs.SY] 1 Sep 2017

PDF Image | On the Powerball Method for Optimization

PDF Search Title:

On the Powerball Method for Optimization

Original File Name Searched:

1603-07421.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP