On the Powerball Method for Optimization

PDF Publication Title:

On the Powerball Method for Optimization ( on-powerball-method-optimization )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 003

name # examples # features RCV1 2.0 ×104 4.7 × 104 KDD10 2.0×105 6.4×105 CTR 2.2×105 6.2×105 # nonzero entries 1.5 × 106 7.4×106 1.3×107 W. Krichene, A. Bayen, P. Bartlett, “Accelerated mirror descent in continuous and discrete time,” 29th Annual Conference on Neural Information Process- ing Systems (NIPS), 2015. D. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical Programming, vol. 45, pp. 503–528, 1989. Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Boston, MA, 2004. J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 2006. S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, 2004 S. Sastry, Nonlinear Systems: Analysis, Stability, and Control, Springer-Verlag, 1999. F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, “1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs,” INTERSPEECH, 2014. A. Bhaya and E. Kaszkurewicz, Control Perspectives on Numerical Algorithms and Matrix Problems, SIAM, 2006. Y. Yuan, “Quasi-Newton Methods,” Wiley Encyclopedia of Operations Research and Management Science, 2011. S. Strogatz, Nonlinear Dynamics and Chaos, Westview Press, 2000. Appendix A. Proof of Proposition 1 Lemma 1. ((Bhat and Bernstein, 1997), Theorem 1) Suppose that a function V (t) : [0, ∞) → [0, ∞) is differentiable (the derivative of V(t) at 0 is defined as its Dini upper derivative), Table 1: Standard benchmark datasets for classification. We used λ = 1 for KDD10 and CTR while λ = 0 for RCV1. Both the gradient descent and L-BFGS (Liu and Nocedal, 1989) are compared with the gradient Powerball method and L- BFGS Powerball method from the same initial conditions which are randomly chosen. The step size in both methods is chosen by standard backtracking line search. The weight w is initialized according to a normal distribution N(0, 0.01). We repeat each ex- periment 10 times and report the averaged results. The codes are available from http://yy311.github.io/software.html for the readers to reproduce the experimental results. We first study the effect of varying γ. We choose four γ values from a set {1, 0.7, 0.4, 0.1}, where for γ = 1 we obtain standard gradient descent. The convergence of different optimization algorithms for each γ are shown in Fig. 1. As can be seen, when a γ < 1 is applied to the gradient in every steps, it can significantly accelerate the convergence as compared with the standard gradient descent method. Especially, on both KDD10 and CTR datasets, less than 10 iterations with γ = 0.1 can result an objective value even less than the one for gradient descent method with 100 iterations. The results for L-BFGS Powerball method comparison (m = 5) are shown in Fig. 2, which are similar to the observations for gradient Powerball method. 6. Discussion It is generally known that dynamical systems (Sastry, 1999) can offer new insight to optimization methods (Lessard et al., 2016; Su at al., 2015; Krichene et al., 2015) by viewing opti- mization algorithms as evolution of dynamical systems. Using intuition from finite-time stability of ordinary differential equa- tions (Bhat and Bernstein, 1997), we generalize the idea to the discrete schemes for optimization and demonstrate that empir- ically the proposed methods can accelerate the process in the initial iterations. When it comes to large-scale optimization prob- lems, initial iterations are crucial given computation constraints. 7. Acknowledgments We would like to thank Dr. Vassilis Vassiliadis, Dr. Benjamin Recht, and Dr. Stephen Boyd for insightful discussions. References S. P. Bhat and D. S. Bernstein, “Finite-time stability of homogeneous systems,” American Control Conference, pp. 2513–2514, 1997. L. Lessard, B. Recht, and A. Packard, “Analysis and design of optimization algorithms via integral quadratic constraints,” SIAM Journal on Optimization, vol. 26, pp. 57–95, 2016. W. Su, S. Boyd and E. J. Candes, “A differential equation for modeling Nes- terov’s accelerated gradient method: theory and insights,” To appear in Journal of Machine Learning Research, vol. 17, pp. 1–43, 2016. such that dV(t) + KVγ(t) is negative for all t, for some constant dt 3 that􏰊n |y|2γ ≥(􏰊n |y|γ+1)2γ , ∀γ∈(0,1). i=1 i i=1 i γ+1 K>0and0<γ<1.ThenV(t)willreachzeroatfinitetime t∗ ≤ V1−γ(0),andV(t)=0forallt≥t∗. K(1−γ) Proof. Let f (t) satisfy the following ODE d f (t) = −K f γ (t). dt Given any initial value f (0) = V (0) > 0, its unique solution is   􏰈 􏰉 1  1−γ 1−γ V1−γ(0) f(t)= −K(1−γ)t+V (0) , t< K(1−γ) .  0, t ≥ V1−γ(0) Since V(0) = f(0), by the Comparison Principle of differential equations in (Bhat and Bernstein, 1997), we have V(t) ≤ f(t), t ≥ 0. Hence, V(t) will reach zero in time V1−γ(0) . Since V(t) ≥ 0 dV(t) K(1−γ) and dt ≤ 0, V(t) remains 0 once convergence. Next, we shall construct a Lyapunov function for eq. (2.1), which has a similar property in Lemma 1. L􏰊et yi = ∂ f (x) , and consider a nonnegative function V(t) = take the derivative of V(t) with respect to t, then we have ∂V(t) 􏰋n ∂V(t)∂y(a)􏰋n 􏰋n ∂y∂xj = i = sign(yi)|yi|γ  i ∂t i=1 ∂yi ∂t i=1 j=1∂xj∂t = − 􏰌sign(y1)|y1|γ 􏰌􏰋 􏰍T sign(y1)|y1|γ . . . (b) n 2γ (c) 2γ 2γ ≤ −m |yi| ≤ −m(γ + 1)γ+1 V 1+γ (t). (A.1) Equality (a) is due to the fact that ∀i, ∂|yi|γ+1 = (γ+1)sign(yi)|yi|γ. i=1 ∂yi 2 Inequality(b)isduetotheHessianH􏰁[ ∂ f ]≽mIforany ∂xi∂xj strongly convex function f . Inequality (c) holds using the fact 1 γ+1 i=1 |yi|γ+1. If we . . . sign(yn)|yn|γ􏰍 H(yi) sign(yn)|yn|γ  K(1−γ) n ∂xi

PDF Image | On the Powerball Method for Optimization

PDF Search Title:

On the Powerball Method for Optimization

Original File Name Searched:

1603-07421.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP