4.1 There exists and For all

Many proofs in theoretical statistics involve convergence. A loose definition was provided earlier: as we collect more and more data, we can be increasingly sure that certain things will happen. We’ll discuss probabilistic convergence in the course, but before that, it will help to be familiar with the rigorous definition of (deterministic) convergence. Here’s the definition; we’ll come back to it later.

Definition. A sequence x1,x2,x_1, x_2, \ldots of real numbers is said to converge if there exists a real number xx such that for all ϵ>0\epsilon > 0, there is an integer NN such that xnx<ϵ\lvert x_n-x\rvert < \epsilon for all n>Nn > N. In this case, we say that xnx_n converges to xx, or that xx is the limit of xnx_n, and write xnxx_n \to x or limxn=x\lim x_n = x (these mean the same thing). If no such number exists, the sequence is said to diverge.

To make this definition concrete, let’s consider a specific sequence: e1,e2,e^{-1}, e^{-2}, \ldots. Intuitively, these numbers are getting closer to zero, so it seems right to say the sequence converges to zero. But does it satisfy the definition? Notice that there are two specific clauses that come up in the above definition, and they occur constantly in mathematical proofs and theorems:

  • “there exists”: There is at least one number satisfying this condition. Denoted \exists.
  • “for all”: This has to be true in every possible case, no exceptions. Denoted \forall.

So, when we say that “there exists a real number xx”, we’re only requiring these conditions to be satisfied for the limit point; in the case of our example xn=enx_n = e^{-n}, we only need to show that it happens at x=0x=0. Maybe this condition is met at other points, maybe it isn’t.

On the other hand, when we say “for all ϵ>0\epsilon> 0”, the rest of the condition has to true for every single positive real number. Maybe we can find some ϵ\epsilon values for which the condition is met, but it if doesn’t hold for all of them, it isn’t good enough.

For example, suppose we consider the point x=0.001x=0.001: does xnx_n converge to 0.001? Well, if ϵ=0.002\epsilon=0.002, then yes, we can find a number NN such that xnx<ϵ\lvert x_n - x\rvert < \epsilon for all n>Nn > N. For example, N=6N=6 works:

e60.001=0.0015<0.002e70.001=0.000088<0.002\begin{align*} \lvert e^{-6}-0.001\rvert &= 0.0015 < 0.002 \\ \lvert e^{-7}-0.001\rvert &= 0.000088 < 0.002 \\ \ldots \end{align*}

However, this isn’t enough – we need to be able to find a suitable NN for all positive ϵ\epsilon. If ϵ=0.0005\epsilon=0.0005, for example, we would never be able find an NN that works. Therefore, xnx_n does not converge to 0.001.

One final important point to note here is that the order of these clauses matters. For every ϵ\epsilon, there exists a suitable NN. We choose ϵ\epsilon first, and can then find an NN that works for that ϵ\epsilon – we don’t need to find an NN that works for all ϵ\epsilon (and in fact, there typically isn’t such a NN). If ϵ=0.001\epsilon=0.001, then N=7N=7 works. If ϵ=0.00000000000000000000000001\epsilon=0.00000000000000000000000001, we are going to have to choose a much larger NN (but we can always find such an NN).

Finding NN is fairly simple in this case; we can simply take the log of both sides:

en<ϵn<logϵn>logϵ.\begin{align*} e^{-n} &< \epsilon\\ -n &< \log \epsilon\\ n &> -\log \epsilon. \end{align*}

So, N=logϵN = \lceil -\log \epsilon\rceil, where \lceil \cdot \rceil denotes the ceiling function (i.e., round up the number inside to the next integer). To make things even more concrete, one could think of N(ϵ)N(\epsilon) as a function:

n <- function(eps) {ceiling(-log(eps))}
n(0.001)
## [1] 7
n(0.0000001)
## [1] 17
n(0.00000000000000001)
## [1] 40