.. _line_search: ******************* Line Search Methods ******************* In line search methods, each iteration is given by :math:`x_{k+1} = x_k + \alpha_k p_k`, where :math:`p_k` is the search direction and :math:`\alpha_k` is the step length. The search direction is often of the form :math:`p_k = -B_k^{-1} \nabla f_k` where :math:`B_k` is a symmetric and non-singular matrix. The form of :math:`p_k` is dependent on algorithm choice. The ideal step length would be :math:`min_{\alpha>0} f(x_k + \alpha p_k)` but this is generally too expensive to calculate. Instead an inexact line search condition such as the Wolfe Conditions can be used: .. math:: f(x_k + \alpha p_k) \leq f(x_k) + c_1 \alpha \nabla f_k^T p_k \\ f(x_k + \alpha_k p_k)^T p_k \geq c_2 \nabla f_k^T p_k With :math:`0 \epsilon`: Compute the search direction :math:`p_k = -H_k \nabla f_k` Then find next iterate :math:`x_{k+1}` by performing a line search. Next, define :math:`s_k = x_{k+1}-x_k` and :math:`y_k = \nabla f_{k+1} - \nabla f_k`, then compute .. math:: H_{k+1} = (I - \rho_k s_k y_k^T)H_k(I - \rho_k y_k s_K^T) + \rho_k s_k s_k^T with :math:`\rho_k = \frac{1}{y_k^T s_k}` **Advantages**: - Superlinear rate of convergence - Has self-correcting properties - if there is a bad estimate for :math:`H_k`, then it will tend to correct itself within a few iterations. - No need to compute the Jacobian or Hessian. **Disadvantages**: - Newton's method has quadratic convergence but this is lost with BFGS. [Nocedal]_ .. _gauss_newton: Gauss Newton (``gauss_newton``) ******************************* Modified Newton's method with line search. Instead of solving standard Newton equations .. math:: \nabla^2 f(x_k)p = -\nabla f(x_k), solve the system .. math:: J_k^T J_k p_k^{GN} = - J_k^T r_k (where :math:`J_k` is the Jacobian) to obtain the search direction :math:`p_k^{GN}`. The next iterate is then set as :math:`x_{k+1} = x_k + p_k^{GN}`. Here, the approximation of the Hessian :math:`\nabla^2 f_k \approx J_k^T J_k` has been made, which helps to save on computation time as second derivatives are not calculated. **Advantages**: - Calculation of second derivatives is not required. - If residuals or their second order partial derivatives are small, then :math:`J_k^T J_k` is a close approximation to :math:`\nabla^2 f_k` and convergence of Gauss-Newton is fast. - The search direction :math:`p_J^{GN}` is always a descent direction as long as :math:`J_k` has full rank and the gradient :math:`\nabla f_k` is nonzero. **Disadvantages**: - Without a good initial guess, or if the matrix :math:`J_k^T J_k` is ill-conditioned, the Gauss Newton Algorithm is very slow to converge to a solution. - If relative residuals are large, then large amounts of information will be lost. - :math:`J_k` must be full rank. [Nocedal]_ [Floater]_ .. [Nocedal] Jorge Nocedal, Stephen J. Wright (2006), Numerical Optimization .. [Poczos] Barnabas Poczos, Ryan Tibshirani (2012), Lecture 10: Optimization, School of Computer Science, Carnegie Mellon University .. [Floater] Michael S. Floater (2018), Lecture 13: Non-linear least squares and the Gauss-Newton method, University of Oslo