Unconstrained Optimization
From NEOS
The unconstrained optimization problem is central to the development of optimization software. Constrained optimization algorithms are often extensions of unconstrained algorithms, while nonlinear least squares and nonlinear equation algorithms tend to be specializations. In the unconstrained optimization problem, we seek a local minimizer of a real-valued function, f(x), where x is a vector of real variables. In other words, we seek a vector, x*, such that
for all x close to x*.
Global optimization algorithms try to find an x* that minimizes f over all possible vectors x. This is a much harder problem to solve. We do not discuss it here because, at present, no efficient algorithm is known for performing this task. For many applications, local minima are good enough, particularly when the user can draw on his/her own experience and provide a good starting point for the algorithm.
Newton's method gives rise to a wide and important class of algorithms that require computation of the gradient vector
and the Hessian matrix,
Although the computation or approximation of the Hessian can be a time-consuming operation, there are many problems for which this computation is justified. We describe algorithms in which the user supplies the Hessian explicitly before moving on to a discussion of algorithms that don't require the Hessian.
Newton's method forms a quadratic model of the objective function around the current iterate The model function is defined by
In the basic Newton method, the next iterate is obtained from the minimizer of qk(s). When the Hessian matrix,
, is positive definite, the quadratic model has a unique minimizer that can be obtained by solving the symmetric
linear system:
The next iterate is then
- xk + 1 = xk + sk
Convergence is guaranteed if the starting point is sufficiently close to a l ocal minimizer x* at which the Hessian is positive definite. Moreover, the rate of convergence is quadratic, that is,
for some positive constant β.
In most circumstances, however, the basic Newton method has to be modified to achieve convergence.
Versions of Newton's method are implemented in the following software packages:
The NEOS Server also has an unconstrained minimization facility to solve these problems remotely over the Internet.
These codes obtain convergence when the starting point is not close to a min imizer by using either a line-search or a trust-region approach.
The line-search variant modifies the search direction to obtain another a downhill, or descent direction for f . It then tries different step lengths along this direction until it finds a step that not only decreases f, but also achieves at least a small fraction of this dir ection's potential.
- The trust-region variant uses the original quadratic model function, but they constrain the new iterate to stay in a local neighborhood of the current iterate. To find the step, then, we have to minim ize the quadratic subject to staying in this neighborhood, which is generally ellipsoidal in shape.
- Line-search and trust-region techniques are suitable if the number of variables n is not too large, because the cost per iteration is of order n3. Codes for problems with a large number of variables tend to use truncated Newton methods, which usually settle for an approximate minimizer of the quadratic model.
So far, we have assumed that the Hessian matrix is available, but the algorithms are unchanged if the Hessian matrix is replaced by a reasonable approximation. Two kinds of methods use approximate Hessians in place of the real thing:
- The first possibility is to use difference approximations to the exact Hessian. We exploit the fact that each column of the Hessian can be approximated by taking the difference between two instances of the gradient vector evaluated at two nearby points. For sparse Hessians, we can often appr oximate many columns of the Hessian with a single gradient evaluation by choosing the eval uation points judiciously.
- Quasi-Newton Methods build up an approximation to the Hessian by keeping track of the gradient differences along each step taken by the algorithm. Various conditions are imposed on the approximate Hessian. For exa mple, its behavior along the step just taken is forced to mimic the behavior of the exact Hessian, and it is usually kept positive definite.
Finally, we mention two other approaches for unconstrained problems that are not so closely related to Newton's method:
- Nonlinear conjugate gradient methods are motivated by the success of the linear conjugate gradient method in minimizin g quadratic functions with positive definite Hessians. They use search directions that co mbine the negative gradient direction with another direction, chosen so that the search will take place along a direction not previously explored by the algorithm. At least, t his property holds for the quadratic case, for which the minimizer is found exactly within just n iterations. For nonlinear problems, performace is problematic, but these meth ods do have the advantage that they require only gradient evaluations and do not use much storage.
- The nonlinear Simplex method (not to be confused with the simplex method for linear programming) requires neither gradient nor Hessian evaluations. Instead, it performs a pattern search based only on func tion values. Because it makes little use of information about f, it typically req uires a great many iterations to find a solution that is even in the ballpark. It can be us eful when f is nonsmooth or when derivatives are impossible to find, but it is unfortunately often used when one of the algorithms above would be more appropriate.
