比较整数和双精度浮点数
Comparing Integers and Doubles

原始链接: http://databasearchitects.blogspot.com/2025/11/comparing-integers-and-doubles.html

自动化测试发现了一个令人惊讶的问题:在Postgres、DuckDB和SQL Server等系统中,混合使用整数和浮点数时,传递性比较(a=b,a=c 并不*总是*意味着 b=c)存在问题。这源于这些系统处理类型提升的方式——在比较过程中将整数转换为双精度浮点数。双精度浮点数具有有限的精度,在表示大整数时会导致信息丢失,从而导致比较不准确。 这种不准确性会影响查询优化,并且关键地影响依赖类型提升的连接操作。为了解决这个问题,作者实现了一个自定义比较函数 (`cmpDoubleInt64`),该函数通过将两个值都转换为双精度浮点数*并*转换回整数,来显式检查精度损失,并考虑潜在的四舍五入。 值得注意的是,Python和SQLite可以正确处理这些比较。然而,测试的大多数其他数据库和编程语言都表现出相同的精度问题,这凸显了标准库功能中的差距。提供的代码为面临此问题的开发人员提供了一个解决方案。

比较整数和双精度浮点数 (databasearchitects.blogspot.com) 5 分,pfent 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 3 条评论 millipede 发表于 10 分钟前 | 下一个 [–] 整数和浮点数都代表真实的、有理数的值,但任何操作都不符合数学规则。 结合律? 不。 交换律? 不。 部分序? 不。 弱序? 不。 对称? 不。 自反? 不。 反对称? 不。 什么都没有。比较有理数的唯一合理方法是字符串的小数展开。回复 pestatije 发表于 31 分钟前 | 上一个 [–] 或者你可以学习如何进行浮点数比较。回复 stronglikedan 发表于 8 分钟前 | 父级 [–] 比如将它们乘以你想要比较的精度,然后将它们作为整数进行比较?/s 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:
相关文章

原文

During automated testing we stumbled upon a problem that boiled down to transitive comparisons: If a=b, and a=c, when we assumed that b=c. Unfortunately that is not always the case, at least not in all systems. Consider the following SQL query:

select a=b, a=c, b=c
from (values(
   1234567890123456789.0::double precision,
   1234567890123456788::bigint,
   1234567890123456789::bigint)) s(a,b,c)

If you execute that in Postgres (or DuckDB, or SQL Server, or ...) the answer is (true, true, false). That is, the comparison is not transitive! Why does that happen? When these systems compare a bigint and a double, they promote the bigint to double and then compare. But a double has only 52 bits of mantissa, which means it will lose precision when promoting large integers to double, producing false positives in the comparison.

This behavior is highly undesirable, first because it confuses the optimizer, and second because (at least in our system) joins work very differently: Hash joins promote to the most restrictive type and discard all values that cannot be represented, as they will never produce a join partner for sure. For double/bigint joins that leads to observable differences between joins and plain comparisons, which is very bad.

How should we compare correctly? Conceptually the situation is clear, an IEEE 754 floating point with sign s, mantissa m, and exponent e represents the values (-1)^s*m*2^e, we just have to compare the integer with that value. But there is no easy way to do that, if we do a int/double comparison in, e.g., C++, the compiler does the same promotion to double, messing up the comparison.

We can get the logic right by doing two conversions: We first convert the int to double and compare that. If the values are not equal, the order is clear and we can use that. Otherwise, we convert the double back to an integer and check if the conversion rounded up or down, and handle the result. Plus some extra checks to avoid undefined behavior (the conversion of intmax64->double->int64 is not defined) and to handle non-finite values, and we get: 

int cmpDoubleInt64(double a, int64_t b) {
   // handle intmax and nan
   if (!(a<=0x1.fffffffffffffp+62)) return 1;

   // fast path comparison
   double bd = b;
   if (a!=bd) return (a<bd)?-1:1;

   // handle loss of precision
   int64_t ai = a;
   if (ai!=b) return (ai<b)?-1:1;
   return 0;
}

Which is the logic that we now use. Who else does it correctly? Perhaps somewhat surprisingly, Python and SQLLite.  Other database systems (and programming languages) that we tested all lost precision during the comparison, leading to tons of problems. IMHO a proper int/double comparison should be available in every programming language, at least as library function. But in most languages (and DBMSes) it isn't. You can use that code above if you ever have this problem.

联系我们 contact @ memedata.com