On the Powerball Method for Optimization

PDF Publication Title:

On the Powerball Method for Optimization ( on-powerball-method-optimization )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 004

4 x 10 14 12 10 8 6 4 2 0 20 40 60 80 100 120 iteration γ= 1 γ=0.7 γ=0.4 γ=0.1 4 x 10 16 14 12 10 8 6 4 2 0 20 40 60 80 100 120 iteration γ= 1 γ=0.7 γ=0.4 γ=0.1 15000 10000 5000 0 0 10 20 30 40 iteration γ= 1 γ=0.7 γ=0.4 γ=0.1 Figure 1: We apply Gradient Powerball method (γ < 1) and gradient descent method (γ = 1) to minimize eq. (5.1) on three datasets. Left: RCV1, middle KDD10, right: CTR. We observe that Gradient Powerball method with γ less than 1 can significantly accelerate the convergence. Especially, on both KDD10 and CTR datasets, the objective value of eq. (5.1) that Gradient Powerball method achieved using 10 iterations (with γ = 0.1) would require 100 iterations for the standard gradient descent method. 4 x 10 14 12 10 8 6 4 2 0 20 40 60 80 100 120 iteration γ= 1 γ=0.7 γ=0.4 γ=0.1 4 x 10 16 14 12 10 8 6 4 2 0 20 40 60 80 100 120 iteration γ= 1 γ=0.7 γ=0.4 γ=0.1 15000 10000 5000 0 0 10 20 30 40 iteration γ= 1 γ=0.7 γ=0.4 γ=0.1 Figure 2: We apply L-BFGS Powerball method (γ < 1) and L-BFGS (γ = 1) to minimize eq. (5.1) on three datasets. We observe a similar result as the comparison of the gradient Powerball method with the gradient descent method. Left: RCV1, middle News20, right: CTR. Using Lemma 1, eq. (A.1) implies that there exists T = Let α = ∇f(xk)·σ(∇f(xk)) > 0 (this holds if x 􏰎 x⋆). Then V 1−γ (0) (γ+1) 1−γ 1+γ 1+γ , ∀γ ∈ (0,1) such that V(t) = 0 when t ≥ T. L|σ(∇ f (xk ))|2 m 1−γ This implies that the system’s state is at its equilibrium. k . 2 􏰊􏰊n γ+12 k k k (∇f(x )·σ(∇f(x )))2 f (x (∇f(xk) · σ(∇f(xk))) k k+1 k 2L|σ(∇f(xk))|2 ) ≤ f (x ) − (C.1) |∇f(xk)|2 = n .(C.2) Appendix B. Proof of Proposition 2 Note that |σ(∇f(x ))|2 ( i=1 |(∇f)i(xk)| ) Consider a nonnegative function V(t) = 1 􏰊n |∇ fi(x)|γ+1, γ+1 i=1 = 􏰊n |(∇f)i(xk)|2γ i=1 similar to the proof of Proposition 1, if we take the derivative of V(t) with respect to t and then we have that, for all t ≥ 0 and γ ∈ (0, 1), ni=1 |(∇f)i(xk)|2 ≥ n ∂ V ( t ) 􏰇􏰇 􏰌 􏰍 􏰇􏰇 2 Applying Lemma 1 leads to the result. The previous inequality follows from 2 γ ∂t =−􏰇􏰇 |∇f1(x)|γ ... |∇fn(x)|γ 􏰇􏰇2 ≤−(γ+1)γ+1 V1+γ (t). 2 γ Proof. Denote by x⋆ the minimizer and f ⋆ = f (x⋆). For brevity, write xk = x(k) for all k ≥ 0. Then, by the L-Lipschitz continuity of ∇ f (Nesterov, 2014) and (3.1), f ( x k + 1 ) ≤ f ( x k ) + ∇ f ( x k ) · ( x k + 1 − x k ) + L2 | x k + 1 − x k | 2 = f(xk)−αk∇f(xk)·σ(∇f(xk))+ L2α2k |σ(∇f(xk))|2 . 􏰋n n( i=1 |yi|γ+1)2 ≥ n 􏰋n i=1 |yi|2γ|yi|2 ≥ 􏰂 􏰋n i=1 |yi|2γ 􏰃 􏰂 􏰋n 􏰃 |yi|2 , Appendix C. Proof of Theorem 1 for 0 ≤ γ ≤ 1 and a vector y ∈ Rn, where the last inequality is Chebyshev’s sum inequality. By strong convexity of f (Nesterov, (C.3) (C.4) This shows that the Powerball scheme converges linearly at a rate (1 − O( mL ))k , by the choice of time-varying steps above. 4 2014), Combining (C.1), (C.2), and (C.3) gives |∇f(x)|2 ≥ 2m(f(x) − f⋆), ∀x ∈ Rn. 􏰄 1 m􏰅 f(xk+1)− f⋆ ≤ 1− n · L (f(xk)− f⋆). i=1 objective objective objective objective objective objective

PDF Image | On the Powerball Method for Optimization

PDF Search Title:

On the Powerball Method for Optimization

Original File Name Searched:

1603-07421.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)