2.8 Integration techniques
Every time we compute a derivative, we get a formula for integration. For example,
This is great news if we are ever faced with the problem of calculating
but if we need to calculate , how can we reverse engineer a function so that its derivative is ?
Unfortunately, this task is sometimes easy, sometimes hard, and sometimes impossible (and you have no way of knowing in advance which situation you are in). One could create a huge table of integral formulas by taking derivatives of various things. Providing such a table is beyond the scope of this review, but such tables exist online and are useful resources to be aware of.
Even with such a table, however, there are a few useful integration techniques to be familiar with. Among other things, (a) it may be faster to use one of these techniques than looking up an integral (b) you might not have access to such a table at the moment, and (c) the form that appears in the table might be slightly different than what you need, and you might have to use one of these techniques in combination with the table to compute the integral.
Substitution
By far the most important technique to be aware of is substitution. For example, we know that , but what if we have to find ? Is it ? The answer (and this is extremely important to understand, because it comes up in statistics all the time) is that no, it isn’t. We can check this easily using the chain rule: the derivative of is , so .
In this case, it’s also fairly clear what we need to do in order to fix the problem: must be : there must be a 1/2 present to cancel the 2 that comes from the chain rule.
Conceptually, letting , we can visualize what’s going on here as follows. Each unit of covers twice as much ground as a unit of . If we don’t do something to correct for this, we’re going to artificially inflate the area under the curve integral (i.e., the integral). This is what’s going on in the red region below, which clearly has greater area than the blue region (the integral we’re trying to calculate).
However, if we compensate for this – we’re stretching out by a factor of 2, so we need to shrink the value of the function by a factor of 2 to preserve the correct area – we get the green region, which has the same area as the original blue region.
To formalize this thinking into a procedure, if , then (this works for any differentiable function )
- Substitute for and for
- Take the integral
- Substitute back for
If we are calculating a definite integral, then instead of step 4, we can transform the limits of integration and to and ; this is usually preferable.
As practice, use this procedure to calculate
You should get .
Integration by parts
Just as the chain rule gave us substitution, the product rule gives us a formula called integration by parts, which is usually written in the form:
As an example of integration by parts in action, suppose we want to integrate . We can write this as
Thus,
As practice, use this procedure to calculate
You should get .
Kernel trick
The above techniques are useful, but in statistics it is often the case that you can avoid them entirely and calculate the answer much faster using something I will call the “kernel trick” (I am not aware of this idea having an official name).
For example, suppose we need to calculate
Sure, we can use substitution, but most statisticians will find it easier to recognize that this is very similar to the exponential distribution, which (like all distributions) integrates to 1:
Applying this shortcut:
The kernel of a distribution is the part that has the variable we’re integrating over. This is the only part that needs to match in order for the trick to work: we can always manipulate the constants as we did above.
As another example, suppose we need to find
This is actually impossible to solve using any of the integration techniques above – there is no elementary form form for its antiderivative. However, it has the kernel of a normal distribution:
Letting and , we get
This may seem complicated at first, but I cannot emphasize enough how important it is to learn this. As a statistician you will become very familiar with these distributions and this will get easier and easier. Every fall, in a ritual as constant as the turning of the leaves, first-year graduate students labor away, trying to solve integrals using elaborate integration by parts techniques, and a professor or older graduate student will look at what they are doing and solve it in seconds using this trick.
As practice, use this procedure to calculate
by using the kernel trick with respect to the gamma distribution, which has density function
You should get , which is 2: if is an integer.