On the Powerball Method for Optimization

PDF Publication Title:

On the Powerball Method for Optimization ( on-powerball-method-optimization )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 002

Proposition 2. For any twice differentiable function f, the proposed continuous Newton Powerball method converges to 4. Variants of the Powerball Method In this section, we consider the following two variants of the proposed Powerball method. 4.1. One-bit gradient descent method First, it is natural to consider the special case γ = 0, which has a very low communication cost for optimizing strongly convex functions: it reduces the communication bandwidth requirement for the data exchanges (Seide et al., 2014) since only the sign for every element of the gradient computation is needed. The one-bit gradient descent method has the following form (simply letγ=0) x(k+1)=x(k)−αksign(∇f(x(k))) for k=0,1,.... (4.1) 4.2. L-BFGS Powerball method The L-BFGS method is a quasi-Newton method (Yuan, 2011) which achieves a similar convergence rate as Newton’s method near the optimal solution. L-BFGS is widely used in practice. We can define its Powerball variant by simply adding a power coefficient to the gradient computation in L-BFGS. The L-BFGS Powerball method is presented in Algorithm 1: Algorithm 1 L-BFGS Powerball method an equilibrium point in finite time T = 1 􏰊n 􏰆􏰆􏰆 ∂ f (x(t)) 􏰆􏰆􏰆γ+1 1+γ , in which 1−γ Through analyzing the continuous versions of optimization algorithms and viewing convergence of continuous optimiza- tion algorithms as stability of dynamical systems, we can apply Lyapunov theory from control theory to gain insight about the underlying optimization algorithms. What remains is to de- rive an analogous proof for discrete-time dynamical systems, or equivalently for optimization algorithms. As pointed out by Su, Boyd and Candes (Su at al., 2015), the translation of ODE theory to optimization algorithms involves parameter tuning (for example, step-size) and tedious calculations. We shall derive, in the following section, the convergence rate for Powerball methods for strongly convex functions so that we can compare it with rates for standard methods. 3. Convergence Analysis Given the intuition in the previous section, we propose the gradient Powerball method whose the iterative scheme writes x(k+1)=x(k)−αkσ(∇f(x(k))), fork=0,1,..., (3.1) where αk is the step size to be chosen. We shall show the convergence for Powerball methods in eq. (1.2) for strongly convex functions. The proof is given in Appendix C. Theorem 1. For any strongly convex function f (with coefficient m) with L-Lipschitz gradient, the proposed gradient Powerball method converges at least linearly to the global minimizer at a rate (1 − O(mL ))k. Remark 1. While the proved linear convergence rate is in the same order as the standard gradient descent method, the Power- ball method seems to outperform the standard gradient descent method, as demonstrated in the examples in Section 5. We note that, in its essence, the Powerball method can be regarded as the steepest gradient descent with respect to the p-norm, where p = 1 + γ1 (cf. Section 9.4 of Boyd and Vandenberghe (2004)). Here, γ serves as an additional parameter for tuning the perfor- mance of the proposed scheme. Remark 2. The step size αk can be chosen in a specific way as in the proof. We can also apply standard backtrack line search to achieve potentially better empirical performance in practice. Remark 3. It remains an interesting open theoretical question to derive a convergence rate that a) exhibits the finite-time con- vergence property we observed for its continuous-time version; b) explicitly depends on the parameter γ and; and c) in particu- lar, explains the empirical speedup during the initial iterations, as observed in Section 5. ((γ+1)V(0)) 1−γ V(t)􏰁γ+1 i=1􏰆∂xi 􏰆 . Proof. See Appendix B. 10: 11: 12: gk = ∇f(x(k)), q = σ(gk) fori=k−1,k−2,...,k−mdo α i = ρ i s Ti q , q = q − αiyi, Hk0 =yTk−1sk−1/yTk−1yk−1, z = H k0 q . end for fori=k−m,k−m+1,...,k−1do β i = ρ i y Ti z , z=z+si(αi −βi). end for 1: 2: 3: 4: 5: 6: 7: 8: 9: To evaluate the Powerball methods, we collected three datasets, which are listed in Table 1. RCV1 is a Reuters news classification dataset1. KDD10 is sampled from the KDD Cup 20102, whose goal is to measure students’ performance. CTR is a sampled ad click-through rate dataset3 . We used the logistic regression with l2-regularization as the objective function. Given a list of example pairs {yi , xi }ni=1 , the goal is to solve the following minimization problem Stop with Hk gk = z 5. Experiments 􏰋n 2 min log(1 + exp(−yi⟨xi, w⟩)) + λ∥w∥2. (5.1) 1 http://about.reuters.com/researchandstandards/corpus/ 2 https://pslcdatashop.web.cmu.edu/KDDCup/ 3http://data.dmlc.ml w i=1 2

PDF Image | On the Powerball Method for Optimization

PDF Search Title:

On the Powerball Method for Optimization

Original File Name Searched:

1603-07421.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP