费曼 vs. 计算机
Feynman vs. Computer

原始链接: https://entropicthoughts.com/feynman-vs-computer

这个例子展示了如何通过策略性地分割数值积分问题,在不一定增加总样本数的情况下,显著提高精度并缩小置信区间。对 sin(x)/x 的积分分别使用单次积分(10,000 个样本)和分割成两个区域(0-6 和 6-100),每个区域使用 5,000 个样本进行评估。 分割方法利用正弦函数的周期性,选择 x=6 作为断点,显著将置信区间范围从 0.423 缩小到 0.169。通过增加样本数量,效果进一步提升——每个区域使用 50,000 个样本的范围仅为 0.054,与单次积分中使用 100,000 个样本的结果相当。 这表明样本的分配方式与样本的数量一样重要,通过将资源集中在对积分值贡献最大的区域,可以节省计算时间。优化断点和样本分布是获得高精度结果的途径。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交 登录 费曼 vs. 计算机 (entropicthoughts.com) 10 分,cgdl 1小时前 | 隐藏 | 过去 | 收藏 | 2 评论 JKCalhoun 5分钟前 | 下一个 [–] 作为一名业余爱好者,我正在玩模拟计算机电路。如果你能用类似的电压曲线匹配你的曲线,一个简单的模拟积分器(一个运算放大器,在反馈中连接一个电容)也会给出曲线下的面积(当然也是电压)。模拟电路(以及运算放大器通常来说)非常酷。我知道,有点跑题了,但最近我对积分很感兴趣。你说“4行Python”,我说“1个运算放大器”。回复 eig 26分钟前 | 上一个 [–] 这种蒙特卡洛方法与典型的数值积分方法(如龙格-库塔)相比,有什么优势?回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

What’s neat is we can still split up the computation like we did before, if we believe it will make the error smaller and the confidence interval narrower. Let’s use the following integral as an example.

\[\int_0^\infty \frac{\sin{x}}{x} \mathrm{d}x\]

This oscillates up and down quite a bit for small \(x\), and then decays but still provides significant contributions for larger \(x\). A naive evaluation would have a confidence interval of

Ic(10_000, 0, 100, x => Math.sin(x)/x)
Object {
  p05: 1.461
  p95: 1.884
}

and while this is certainly correct2 The actual value of the integral is half \(\pi\) or approximatey 1.571., we can do better. We’ll estimate the region of 0–6 separately from 6–100, using half the samples for each3 Why put the break point at 6? The period of sin is a full turn, which is roughly 6 radians. This ensures we get roughly symmetric contributions from both integrals. That’s not necessary for the technique to work, but it makes the illustration a little cleaner.:

Ic(5_000, 0, 6, x => Math.sin(x)/x)
Object {
  p05: 1.236
  p95: 1.468
}

This contains the bulk of the value of the integral, it seems. Let’s see what remains in the rest of it.

Ic(5_000, 6, 100, x => Math.sin(x)/x)
Object {
  p05: 0.080
  p95: 0.198
}

We can work backwards to what the standard errors must have been to produce these confidence intervals.4 The midpoint is the point estimation for each region, and the standard error is 1/1.645 times the distance between the 5 % point and the midpoint.

Region Value Standard error
0–6 1.4067 0.0372
6–100 0.1390 0.0359

The estimation of the total area would be the values summed, i.e. 1.5457. The estimation of the standard error of this we get through Pythagorean addition and it is approximately 0.05143. We convert it back to a confidence interval and compare with when we did not break it up into multiple components.

Method 5 % 95 % Range
Single operation (10,000 samples) 1.461 1.884 0.423
Two operations (5,000 samples × 2) 1.461 1.630 0.169

Although in this case the two methods happen to share a lower bound, the upper bound has been dramatically reduced. The total range of the confidence interval is more than halved! This was because we allocated the samples more cleverly – concentrated them in the early parts of the function – rather than increased the number of samples.

That said, we’re at a computer, so we could try increasing the sample count. Or maybe both?

Method 5 % 95 % Range
Single operation (10,000 samples) 1.461 1.884 0.423
Two operations (5,000 samples × 2) 1.461 1.630 0.169
Single operation (100,000 samples) 1.549 1.680 0.131
Two operations (50,000 samples × 2) 1.524 1.578 0.054

It seems like sampling more cleverly has almost the same effect as taking ten times as many samples.

We could play around with where to put the breakpoint, and how many samples to allocate to each side of it, and see which combination yields the lowest error. Then we can run that combination with a lot of samples to get the most accurate final result. That would take maybe 15 minutes of tooting about and exploring sensible-seeming alternatives, so it’s probably still quicker than integrating by hand.

联系我们 contact @ memedata.com