Nonlinear programming problems have either a nonlinear f(x) or a solution set X defined by nonlinear equations and inequalities. Nonlinear programming is a broad category of optimization problems and includes the following subcategories:
When you select a search method, consider whether the method uses derivatives, which can help you determine the suitability of the method for a particular optimization problem. For example, the downhill simplex method, also known as the Nelder-Mead method, uses only evaluations of f(x) to find the optimal solution. Because it uses only evaluations of f(x), the downhill simplex method is a good choice for problems with pronounced nonlinearity or with problems containing a significant number of discontinuities.
The search methods that use derivatives, such as the gradient search methods, work best with problems in which the objective function is continuous in its first derivative.
The process of iteratively searching along a vector for the minimum value on the vector is line minimization or line searching. Line minimization can help establish a search direction or verify that the chosen search direction is likely to produce an optimal solution.
Nonlinear programming search algorithms use line minimization to solve the subproblems leading to an optimal value for f(x). The search algorithm searches along a vector until it reaches the minimum value on the vector. After the search algorithm reaches the minimum on one vector, the search continues along another vector, usually orthogonal to the first vector. The line search continues along the new vector until reaching its minimum value. The line minimization process continues until the search algorithm finds the optimal solution.
The goal of any optimization problem is to find a global optimal solution. However, nonlinear programming problems are continuous optimization problems so the solution set X for a nonlinear programming problem might be infinitely large. Therefore, you might not be able to find a global optimum for f(x). In practice, you solve most nonlinear programming problems by finding a local optimum for f(x).
In terms of solution set X, x* is a global minimum of f over X if it satisfies the following relationship.
![]() |
(A) |
A local minimum is a minimum of the function over a subset of the domain. In terms of solution set X, x* is a local minimum of f over X if
, and an ε > 0 exists so that the following relationship is true.
![]() |
(B) |
where
The following figure illustrates a function of x where the domain is any value between 32 and 65;
.

In the previous figure, A is a local minimum because you can find ε > 0, such that f(x*) ≤ f(x). ε = 1 would suffice. Similarly, C is a local minimum. B is the global minimum because f(x*) ≤ x for 
The downhill simplex method developed by Nelder and Mead uses a simplex and performs function evaluations without derivatives.
![]() |
Note Although the downhill simplex method and the linear programming simplex method use the concept of a simplex, the methods have nothing else in common. |
Most practical applications involve solution sets that are non-degenerate simplexes. A non-degenerate simplex encloses a finite volume of N dimensions. If you take any point of the non-degenerate simplex as the origin of the simplex, the remaining N points of the simplex define vector directions spanning the N-dimensional space.
The downhill simplex method requires that you define an initial simplex by specifying N + 1 starting points. No effective means of determining the initial starting point exists. You must judge the best location from which to start. After deciding upon an initial starting point P0, you can use Equation C to determine the other points needed to define the initial simplex.
| Pi = P0 + λei | (C) |
where ei is a unit vector and λ is an estimate of the characteristic length scale of the problem.
Starting with the initial simplex defined by the points from Equation C, the downhill simplex method performs a series of reflections. A reflection moves from a point on the simplex through the opposite face of the simplex to a point where the function f is smaller. The configuration of the reflections conserves the volume of the simplex, which maintains the non-degeneracy of the simplex. The method continues to perform reflections until the function value reaches a predetermined tolerance.
Because of the multidimensional nature of the downhill simplex method, the value it finds for f(x) might not be the optimal solution. You can verify that the value for f(x) is the optimal solution by repeating the process. When you repeat the process, use the optimal solution from when you first ran the method as P0. Reinitialize the method to N + 1 starting points using Equation C.
The golden section search method finds a local minimum of a 1D function by bracketing the minimum. Bracketing a minimum requires a triplet of points, as shown in the following relationship.
| a < b < c such that f(b) < f(a) and f(b) < f(c) | (D) |
Because the relationship in Equation D is true, the minimum of the function is within the interval (a, c). The search method starts by choosing a new point x between either a and b or between b and c. For example, choose a point x between b and c and evaluate f(x). If f(b) < f(x), the new bracketing triplet is a < b < x. If f(b) > f(x), the new bracketing triplet is b < x < c. In each instance, the middle point, b or x, is the current optimal minimum found during the current iteration of the search.
Given that a < b < c, point b is a fractional distance W between a and c, as shown in the following equations.
![]() |
(E) |
A new point x is an additional fractional distance Z beyond b, as shown in Equation F.
![]() |
(F) |
Given Equation F, the next bracketing triplet can have either a length of W + Z relative to the current bracketing triplet or a length of 1 – W. To minimize the possible worst case, choose Z such that the following equations are true.
| W + Z = 1 – W | (G) |
| Z = 1 – 2W | (H) |
Given Equation H, the new x is the point in the interval symmetric to b. Therefore, Equation I is true.
| |b – a| = |x – c| | (I) |
You can imply from Equation I that x is within the larger segment because Z is positive only if W < 1/2.
If Z is the current optimal value of f(x), W is the previous optimal value of f(x). Therefore, the fractional distance of x from b to c equals the fractional distance of b from a to c, as shown in Equation J.
![]() |
(J) |
Equations H and J yield the following quadratic equation.
W2 – 3W + 1 = 0
|
(K) |
Therefore, the middle point b of the optimal bracketing interval a < b < c is the fractional distance of 0.38197 from one of the end points and the fractional distance of 0.61803 from the other end point. 0.38197 and 0.61803 comprise the golden mean, or golden section, of the Pythagoreans.
The golden section search method uses a bracketing triplet and measures from point b to find a new point x a fractional distance of 0.38197 into the larger interval, either (a, b) or (b, c), on each iteration of the search method. Even when starting with an initial bracketing triplet whose segments are not within the golden section, the process of successively choosing a new point x at the golden mean quickly causes the method to converge linearly to the correct, self-replicating golden section. After the search method converges to the self-replicating golden section, each new function evaluation brackets the minimum to an interval only 0.61803 times the size of the preceding interval.
Gradient search methods determine a search direction by using information about the slope of f(x). The search direction points toward the most probable location of the minimum. After the gradient search method establishes the search direction, it uses iterative descent to move toward the minimum.
The iterative descent process starts at a point x0, which is an estimate of the best starting point, and successively produces vectors x1, x2, …, so f decreases with each iteration, as shown in the following relationship.
| f(xk + 1) < f(xk) | (L) |
for k = 0, 1, …
where k is the iteration number, f(xk + 1) is the objective function value at iteration k + 1, and f(xk) is the objective function value at iteration k.
Successively decreasing f improves the current estimate of the solution. The iterative descent process attempts to decrease f to its minimum.
The following equations and relationships provide a general definition of the gradient search method of solving nonlinear programming problems.
| xk + 1 = xk + αkdk | (M) |
for k = 0, 1, …
where dk is the search direction and αk is the step size.
In Equation M, if the gradient of the objective function
the gradient search method needs a positive value for αk and a value for dk that fulfills the following relationship.
![]() |
(N) |
Iterations of gradient search methods continue until xk + 1 = xk.
A global minimum is a value for f(x) that satisfies the relationship described by Equation A.
Ideally, iteratively decreasing f converges to a global minimum for f(x). In practice, convergence rarely proceeds to a global minimum for f(x) because of the presence of local minima that are not global. Local minima attract gradient search methods because the form of f near the current iterate and not the global structure of f determines the downhill course the method takes.
When a gradient search method begins on or encounters a stationary point, the method stops at the stationary point. Therefore, a gradient search method converges to a stationary point. If f is convex, the stationary point is a global minimum. However, if f is not convex, the stationary point might not be a global minimum. Therefore, if you have little information about the locations of local minima, you might have to start the gradient search method from several starting points.
Because a gradient search method does not produce convergence at a global minimum, you must decide upon an error tolerance ε that assures that the point at which the gradient search method stops is at least close to a local minimum. Unfortunately, no explicit rules exist for determining an absolutely accurate ε. The selection of a value for ε is somewhat arbitrary and based on an estimate about the value of the optimal solution.
Use the accuracy input of the Optimization VIs to specify a value for ε.
![]() |
Note You can use the Optimization VIs only in the LabVIEW Full and Professional Development Systems. |
The nonlinear programming optimization VIs iteratively compare the difference between the highest and lowest input values to the value of accuracy until two consecutive approximations do not differ by more than the value of accuracy. When two consecutive approximations do not differ by more than the value of accuracy, the VI stops.
Conjugate direction methods attempt to find f(x) by defining a direction set of vectors such that minimizing along one vector does not interfere with minimizing along another vector, which prevents indefinite cycling through the direction set.
When you minimize a function f along direction u, the gradient of f is perpendicular to u at the line minimum. If P is the origin of a coordinate system with coordinates x, you can approximate f by the Taylor series of f, as shown in Equation O.
![]() ![]() |
(O) |
where
, 
and

The components of matrix A are the second partial derivative matrix of f. Matrix A is the Hessian matrix of f at P.
The following equation gives the gradient of f.
![]() |
(P) |
The following equation shows how the gradient
changes with movement along A.
![]() |
(Q) |
After the search method reaches the minimum by moving in direction u, it moves in a new direction v. To fulfill the condition that minimization along one vector does not interfere with minimization along another vector, the gradient of f must remain perpendicular to u, as shown in Equation R.
![]() |
(R) |
When Equation R is true for two vectors u and v, u and v are conjugate vectors. When Equation R is true pairwise for all members of a set of vectors, the set of vectors is a conjugate set. Performing successive line minimizations of a function along a conjugate set of vectors prevents the search method from having to repeat the minimization along any member of the conjugate set.
If a conjugate set of vectors contains N linearly independent vectors, performing N line minimizations arrives at the minimum for functions having the quadratic form shown in Equation O. If a function does not have exactly the form of Equation O, repeated cycles of N line minimizations eventually converge quadratically to the minimum.
At an N-dimensional point P, the conjugate gradient search methods calculate the function f(P) and the gradient
.
is the vector of first partial derivatives. The conjugate gradient search method attempts to find f(x) by searching a gradient conjugate to the previous gradient and conjugating to all previous gradients, as much as possible.
The Fletcher-Reeves method and the Polak-Ribiere method are the two most common conjugate gradient search methods. The following theorems serve as the basis for each method.
Theorem A has the following conditions:
| gi + 1 = gi – λiAhi | (S) |
| hi + 1 = gi + 1 + γihi | (T) |
where the chosen values for λi and γi make gi + 1gi = 0 and hi + 1Ahi = 0, as shown in the following equations.
![]() |
(U) |
![]() |
(V) |
If the denominators equal zero, take λi = 0, γi = 0.
| gigj = 0 | (W) |
| hiAhj = 0 | (X) |
The elements in the sequence that Equation S produces are mutually orthogonal. The elements in the sequence that Equation T produces are mutually conjugate.
Because Equation W is true, you can rewrite Equations U and V as the following equations.
![]() |
(Y) |
![]() |
(Z) |
The following theorem defines a method for constructing the vector from Equation S when the Hessian matrix A is unknown:
| f(x) ≈ c – b ˇ x + (1/2)x ˇ A ˇ x | (AA) |
for some point Pi.![]() |
(AB) |
The vector gi + 1 that Equation AB yields is the same as the vector that Equation S yields when the Hessian matrix A is known. Therefore, you can optimize f without having knowledge of Hessian matrix A and without the computational resources to calculate and store the Hessian matrix A. You construct the direction sequence hi with line minimization of the gradient vector and the latest vector in the g sequence.
Both the Fletcher-Reeves method and the Polak-Ribiere method use Theorem A and Theorem B. However, the Fletcher-Reeves method uses the first term from Equation Y for γi, as shown in Equation AC.
![]() |
(AC) |
The Polak-Ribiere method uses the second term from Equation Y for γi, as shown in Equation AD.
![]() |
(AD) |
Equation AC equals Equation AD for functions with exact quadratic forms. However, most functions in practical applications do not have exact quadratic forms. Therefore, after you find the minimum for the quadratic form, you might need another set of iterations to find the actual minimum.
When the Polak-Ribiere method reaches the minimum for the quadratic form, it resets the direction h along the local gradient, essentially starting the conjugate-gradient process again. Therefore, the Polak-Ribiere method can make the transition to additional iterations more efficiently than the Fletcher-Reeves method.