What’s neat is we can still split up the computation like we did before, if we believe it will make the error smaller and the confidence interval narrower. Let’s use the following integral as an example.
\[\int_0^\infty \frac{\sin{x}}{x} \mathrm{d}x\]
This oscillates up and down quite a bit for small \(x\), and then decays but still provides significant contributions for larger \(x\). A naive evaluation would have a confidence interval of
Ic(10_000, 0, 100, x => Math.sin(x)/x)
Object {
p05: 1.461
p95: 1.884
}
and while this is certainly correct2 The actual value of the integral is half \(\pi\) or approximatey 1.571., we can do better. We’ll estimate the region of 0–6 separately from 6–100, using half the samples for each3 Why put the break point at 6? The period of sin is a full turn, which is roughly 6 radians. This ensures we get roughly symmetric contributions from both integrals. That’s not necessary for the technique to work, but it makes the illustration a little cleaner.:
Ic(5_000, 0, 6, x => Math.sin(x)/x)
Object {
p05: 1.236
p95: 1.468
}
This contains the bulk of the value of the integral, it seems. Let’s see what remains in the rest of it.
Ic(5_000, 6, 100, x => Math.sin(x)/x)
Object {
p05: 0.080
p95: 0.198
}
We can work backwards to what the standard errors must have been to produce these confidence intervals.4 The midpoint is the point estimation for each region, and the standard error is 1/1.645 times the distance between the 5 % point and the midpoint.
| Region | Value | Standard error |
|---|---|---|
| 0–6 | 1.4067 | 0.0372 |
| 6–100 | 0.1390 | 0.0359 |
The estimation of the total area would be the values summed, i.e. 1.5457. The estimation of the standard error of this we get through Pythagorean addition and it is approximately 0.05143. We convert it back to a confidence interval and compare with when we did not break it up into multiple components.
| Method | 5 % | 95 % | Range |
|---|---|---|---|
| Single operation (10,000 samples) | 1.461 | 1.884 | 0.423 |
| Two operations (5,000 samples × 2) | 1.461 | 1.630 | 0.169 |
Although in this case the two methods happen to share a lower bound, the upper bound has been dramatically reduced. The total range of the confidence interval is more than halved! This was because we allocated the samples more cleverly – concentrated them in the early parts of the function – rather than increased the number of samples.
That said, we’re at a computer, so we could try increasing the sample count. Or maybe both?
| Method | 5 % | 95 % | Range |
|---|---|---|---|
| Single operation (10,000 samples) | 1.461 | 1.884 | 0.423 |
| Two operations (5,000 samples × 2) | 1.461 | 1.630 | 0.169 |
| Single operation (100,000 samples) | 1.549 | 1.680 | 0.131 |
| Two operations (50,000 samples × 2) | 1.524 | 1.578 | 0.054 |
It seems like sampling more cleverly has almost the same effect as taking ten times as many samples.
We could play around with where to put the breakpoint, and how many samples to allocate to each side of it, and see which combination yields the lowest error. Then we can run that combination with a lot of samples to get the most accurate final result. That would take maybe 15 minutes of tooting about and exploring sensible-seeming alternatives, so it’s probably still quicker than integrating by hand.