欧拉猜想与CDC 6600
Euler Conjecture and CDC 6600

原始链接: https://fortran-lang.discourse.group/t/euler-conjecture-and-cdc-6600/10501

讨论的重点在于预计算数据与运行时计算的性能权衡。虽然将整数的五次方直接编译到可执行文件中对于小数组来说似乎很有效率,但它会显著增加加载时间,因为数据必须在启动时从磁盘读取。运行时计算这些值,虽然最初看起来较慢,但速度可以快 10³ 到 10⁵ 倍,尤其是在只需要数据子集的情况下(例如,本例中仅需 10,000 个元素中的 144 个)。 最佳方法取决于计算环境。对于专用机器,最小化*经过时间*(墙上时间)是关键。然而,在共享系统中,最小化*用户时间*更为重要,因为系统时间(例如,磁盘访问)不会直接计费给用户。使用 gfortran 在 Apple M2 上提供的计时结果表明,运行时计算时系统时间很短,突出了其效率。最后,计时本身具有可变性,需要多次运行才能获得可靠的结果。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 欧拉猜想和 CDC 6600 (fortran-lang.discourse.group) 6 分,作者 zaikunzhang 44 分钟前 | 隐藏 | 过去 | 收藏 | 1 条评论 zaikunzhang 44 分钟前 [–] 参见 https://community.intel.com/t5/image/serverpage/image-id/116... 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

I don’t think this warning applies here because the array is small, but the danger of this approach in general is that the executable file must contain that compile-time computed data, which means that it takes time to load that memory from disk (SSD, etc.) into the process memory on startup. In contrast, if that memory is allocated at run time directly by the executable, and then filled by cpu instructions, then it can be some 10^3 to 10^5 times faster. You can also see the difference by looking at the size of the executable image (ls -l a.out). With most modern compilers, you do not see the size of the declared array reflected in the executable size unless they are parameter arrays, arrays initialized at compile time, or arrays in common blocks.

Also, if done at compile time, the whole array would need to be computed. That is 10^4 elements in this case (and integer overflows would be generated in doing so). The above run time code only computes 144 elements of the array before finding a solution and stopping.

A further question is where does that extra effort get charged? This extra effort is appropriate if the user is paying for that time (e.g. with money, or with elapsed time, or with total throughput through the machine), but not if that overhead is somehow not charged as user time (e.g. in a timeshare or batch environment with many other users). This is why the programmer sometimes writes code to minimize wall time and sometimes to minimize cpu time. Those two goals are not always exactly the same.

In the original code, the posix time command was used for the timings. That command returns three different time values for exactly this reason. If you alone own the machine you are running on, and you want to maximize throughput through that machine every 24 hour period, then it is the elapsed time that is critical. If you are one of many users sharing the machine, then it is the user time that you want to minimize, the system time is not charged to you because while your job is stalled waiting for the disk to respond, someone else’s job is swapped in and is being charged to execute its user time.

Here is the timing result of that last version of the code using gfortran -O3 with an Apple M2 cpu
on MacOS. It computes the i**5 values at run time, so the system time is minimal.

i^5 =  61917364224
133 110 84 27 144

real    0m0.141s
user    0m0.139s
sys     0m0.002s

One other comment about timings is that they are almost never really consistent from run to run. For small segments of code like this, one can do

$ time a.out; time a.out; time a.out

The first results are usually longer than the others, which are then usually more consistent if not identical at the millisecond level. But if timings are measured at the microsecond level, they would show variations too.

联系我们 contact @ memedata.com