2.4 Optimization
Terminology
The most useful thing about derivatives is that they enable us to find the maximum and minimum values of a function. As noted earlier, this arises constantly in statistics. First, some terminology (below, is a function with domain ):
- absolute maximum: The point is an absolute maximum of if for all in .
- maximum value: The maximum value of is , where is an absolute maximum of .
- local maximum: The point is a local maximum (or relative maximum) of if there is an interval containing such that for all in .
Absolute minimum, minimum value, and local minimum are defined similarly. Finally, a point is an extreme value if is either an absolute maximum or an absolute minimum, while is a local extremum if is a local maximum or local minimum.
Derivatives and extreme values
What does this have to do with derivatives? The following result is so important, you should memorize it word for word and never forget it.
If has a local extremum at , and if exists, then .
A point satisfying is called a critical point of . Practically speaking, this means that if we want to maximize or minimize a function, we just need to find its critical points. However, we do need to be aware of a few caveats:
- The derivative has to exist. For example, we cannot minimize with derivatives, because the minimum occurs at 0 and is not differentiable at 0.
- The converse of the above statement is not true. It is true that if is a local extremum (and differentiable), then is a critical point. However, can be a critical point without being an extremum. For example, 0 is a critical point of , but it is not a local minimum or maximum.
- If we find a critical point , even if it is an extremum, we don’t know whether it minimizes or maximizes .
- The function might not have any critical points.
More information about caveats 3 and 4 is given below.
Don’t led these caveats obscure the main result, though – this is arguably the most useful thing in all of calculus.
Monotonicity and convexity
Monotone functions: If is differentiable, why wouldn’t it have any critical points (#4 above)? The most likely answer is that it is monotone. A function is called increasing if for all and decreasing if for all . A function that is either increasing or decreasing is called monotone.
For a differentiable function, whether it is monotone or not is related to its derivative:
- If for all , then is increasing.
- If for all , then is decreasing.
So there you have it. If is differentiable, there are three possibilities: it is always going up (), always going down (), or sometimes going up and sometimes going down, in which case it will cross zero and have a critical point (due to the intermediate value theorem.)
Tests for min/max: Often, it is obvious whether a critical point is a minimum or maximum. However, if you’re not sure, you can do one of two things:
- Plug in a number less than , then greater than . If changes from negative to positive, is a local minimum. If it changes from positive to negative, is a local maximum. If it does not change sign, is not a local extremum. This is known as the “first derivative test”.
- Take the second derivative at (assuming it exists). If , then is a local minimum. If , then is a local maximum. This is known as the “second derivative test”. Note that if , the test is inconclusive – could be a local max, a local min, or neither.
Convexity and concavity: If a function is always curving upwards or downwards, then no tests are needed and no distinctions between local and global extrema are necessary. To define this formally, imagine drawing a tangent line to a function at every point in its domain. If always lies above the tangent line, it is said to be convex (curving upwards). If always lies below the tangent line, it is concave (curving downwards). With respect to optimization,
- If is convex, then any critical point is a global minimum.
- If is concave, then any critical point is a global maximum.
Some textbooks / math classes refer to these as “concave up” and “concave down”, but you should learn concave/convex since it is far more common in the statistics, mathematics, and optimization literature.
Optimization is an enormous subject with giant textbooks devoted to it, so obviously this isn’t the whole story. However, taking the derivative and setting it equal to zero truly is the main idea, and solves a huge range of optimization problems.