Nonlinear Least Squares

From NEOS

Jump to: navigation, search

The nonlinear least squares problem has the general form

\min \{ r(x) : x \in R^n \}

where r is the function defined by r(x) = \frac{1}{2}\| f(x) \|_2^2 for some vector-valued function f that maps Rn to Rm.

Least squares problems often arise in data-fitting applications. Suppose that some physical or economic process is modeled by a nonlinear function φ that depends on a parameter vector x and time t. If bi is the actual output of the system at time ti, then the residual

φ(x,ti) − bi

measures the discrepancy between the predicted and observed outputs of the system at time ti. A reasonable estimate for the parameter x may be obtained by defining the ith component of f by

fi(x) = φ(x,ti) − bi

and solving the least squares problem with this definition of f.

From an algorithmic point of view, the feature that distinguishes least squares problems from the general unconstrained optimization problem is the structure of the Hessian matrix of r. The Jacobian matrix of f,

f'(x) = \left( \partial_1 f(x), \ldots, \partial_n f(x) \right)

can be used to express the gradient of r since \nabla r(x) = f'(x)^T f(x). Similarly, f'(x)is part of the Hessian matrix

\nabla^2 r(x) = f'(x)^T f'(x) + \sum_{i=1}^m f_i(x) \nabla^2 f_i(x)

To calculate the gradient of r, we need to calculate the Jacobian matrix f'(x). Having done so, we know the first term in the Hessian matrix, namely f'(x)Tf'(x) without doing any further evaluations. Nonlinear least squares algorithms exploit this structure.

In many practical circumstances, the first term, f'(x)Tf'(x) in the Hessian is more important than the second term, most notably when the residuals fi(x) are small at the solution. Specifically, we say that a problem has small residuals if, for all x near a solution, the quantities

|f_i(x)| \| \nabla^2 f_i(x) \|, \quad i=1,2,\ldots,n

are small relative to the smallest eigenvalue of f'(x)Tf'(x)