Zero SR1 quasi-Newton method

Fork me on GitHub

Background

The zeroSR1 package is based on a proximal quasi-Newton algorithm to solve

 min_{x}; f(x) + h(x)

where f is a smooth convex function and h is a (possibly non-smooth, and possibly infinite) convex function such that the

Exploiting the nature of h, we show in 'A quasi-Newton proximal splitting method’ (Becker, Fadili; NIPS 2012) that one can also compute the proximity operator of h in a scaled norm:

  text{prox}_h^V(y) = text{argmin}_x ; h(x) + frac{1}{2}|x-y|_V^2,; V = D + sigma uu^T

where D is a diagonal matrix, u is a vector so that uu^T is a rank-1 matrix, and sigma is pm 1.

Because we can efficiently solve for the scaled prox, it opens up the possibility of a quasi-Newton method. The SR1 update is a rank-1 update, and by using a 0-memory version, the updates to the inverse Hessian are in exactly the form of D + sigma uu^T.

This means that for the same cost as a proximal gradient method (or an accelerated one, like FISTA), we can incorporate second order information, and the method converges very quickly.

Types of non-smooth terms we can handle

The non-smooth term h can be infinite valued; for example, it may be an indicator function of a set. The indicator function of a set C is denoted

 iota_C(x) = begin{cases} 0 & x in C  +infty & x notin C end{cases}

Equivalently, we are enforcing the constraint x in C in the optimization problem.

We can solve the scaled prox of the following h in mathcal{O}(n log n) time (compared to mathcal{O}(n) time for the regular prox) for inputs of dimension n:

Function mathematical representation
l1 norm h(x) = lVert xrVert_1 = sum_{i=1}^n lvert x_i rvert
non-negativity constraints h(x) = iota_{C_+}, C_+ = { x mid x ge 0 }
l1 norm and non-negativity h(x) =  lVert xrVert_1 + iota_{C_+}
box constraints h(x) = iota_{C_{text{box}}}, C_text{box} = { x mid ell le x le u }
ell_infty norm ball h(x) = iota_{C_infty}, C_infty = { x mid lVert x rVert_infty le 1 }
hinge loss h(x) = sum_{i=1}^n max( 0, 1 - x_i )

Code

We have put a Matlab/Octave implementation on github, under the BSD 3-clause license. If you are interested in contributing a version in python or R, we will be glad to assist.

UPDATE March 2015: Andres Asensio Ramos has a python version on github.