Newton Methods

From NEOS

Revision as of 01:04, 10 April 2007 by Leyffer (Talk | contribs)
Jump to: navigation, search

IMSL, LANCELOT, MATLAB, NAG (NAG Fortran, or NAG C) OPTIMA, PORT 3, TN/TNBC , and VE08 implement quasi-Newton, truncated Newton, and Newton algorithms for bound-constrained optimization. The NEOS SERVER's bound-constrained minimization facility also implements a quasi-Newton algorithm. All these codes implement line-search and trust-region versions of unconstrained minimization algorithms, so our discussion here is brief, emphasizing the differences between the unconstrained and bound-constrained cases.

A line-search method for bound-constrained problems generates a sequence of iterates by setting

xk + 1 = xk + αkdk

where xk is a feasible approximation to the solution, dkis a search direction, and αk > 0 is the step. The direction dk is obtained as an approximate minimizer of the subproblem

\min_d \left\{ \nabla f(x_k)^T d + \frac{1}{2} d^T B_k d \; : \; d_i = 0 \; \forall i \in W_k\right\}

where Wkis the working set and Bk is an approximation to the Hessian matrix of f(x) at xk. All variables in the working set Wk are fixed during this iteration, while all other variables are in the free set Fk. We can express this subproblem in terms of the free variables by noting that it is equivalent to the unconstrained problem

\min_w \left\{ g_k^T w + \frac{1}{2} w^T A_k w, \; w \in R^{m_k} \right\}

where mk is the number of free variables, Ak is the matrix obtained from Bk by taking those rows and columns whose indices correspond to the free variables, and gk is obtained from \nabla f(x_k) by taking the components whose indices correspond to the free variables, Fk.

The main requirement on Wk is that dk be a feasible direction, that is, xk + αdk satisfies the constraints for all α > 0 sufficiently small. This is certainly the case if Wk = A(xk), where

A(x_k) = \{ i : x_i = l_i\} \cup \{ i : x_i = u_i \|

is the set of active constraints at x. A s long as progress is being made with the current Wk, the next working set Wk + 1 is obtained by merging A(xk + 1) with Wk. This updating process is continued until the function cannot be reduced much further with the current working set. At this point, the classical strategy is to drop a constraint in Wk for which \partial_i f(x_k) has the wrong sign, that is, i \in W_k but i \not \in B(x_k), where the binding set

B(x_k) = \{ i : x_i = l_i, \partial_i f(x_k) \geq 0 \} \cup \{ i : x_i = u_i, \partial_i f(x_k) \leq 0 \}

is defined as before. In general it is advantageous to drop more than one constraint, in the hope that the algorithm will make more rapid progress toward s the optimal binding set. However, all dropping strategies are constrained by the requir ement that the solution dk of the subproblem be a feasible direction.

An implementation of a line-search method based on subproblem (l#bcsub 1.1) must cater to the situation in which the reduced Hessian matrix Ak is indefinite, because in this case the subproblem does not have a solution. This situation may arise, for example, if Bk is the Hessian matrix or an approximation obtained by differences of the gradient. Here, it is necessary to specify dk by other means. For example, we can use the modified Cholesky factorization.

Quasi-Newton methods for bound-constrained problems update an approximation to t he reduced Hessian matrix since, as already noted, only the reduced Hessian matrix is likely to be positive definite. The updating process is not entirely satisfactory because there are situations in which a positive definite update that satisfies the quasi-Newton condition does not exist. Moreover, complications arise because the dimension of th e reduced matrix changes when the working set Wk changes. Quasi-Newt on methods are usually beneficial when the working set remains fixed during consecutive iterations.

The choice of line-search parameter Wk is quite similar to the unconstrained case. If subproblem (1.1) has a solution dk and xk + dkviolates one of the constraints, then we compute the largest \mu_k \in (0,1) such that

xk + μkdk

is feasible. A standard strategy for choosing αk is to seek an \alpha_k \in (0,\mu_k] that satisfies the sufficient decrease and curvature conditions. We are guaranteed the existence of such an αk unless μk satisfies the sufficient decrease condition and

\nabla f(x_k + \mu_k d_k)^T d_k < 0

This situation is likely to happen if, for example, f(x) is strictly decreasing on the line segment [xk,xk + μkdk]. In this case it is safe to set αk = μk.