随机整数的熵是多少?
What's the Entropy of a Random Integer?

原始链接: https://quomodocumque.wordpress.com/2026/02/03/whats-the-entropy-of-a-random-integer/

这篇帖子探讨了描述随机整数 *n*(在 *N* 和 2*N* 之间)的“素数分解”的概率分布的熵。作者将其与随机排列的循环结构联系起来,特别是泊松-狄利克雷过程,并指出素因子大小与循环长度之间的联系。 核心思想是通过考虑随机排列中每个长度 *i* 的循环的期望数量来估计熵。作者利用近似方法(循环计数的泊松分布,对数阶乘的斯特林公式),得出了估计的平均熵为 1。 然而,他们表达了不确定性,引用初步计算表明平均值可能小于 1,并质疑熵是否收敛于一个分布,还是仅仅收敛于一个平均值。最终,这篇帖子将此框架设想为一个关于素数分解的信息内容的思考实验,并与埃尔德什-卡茨定理(关于素数因子数量)进行类比。作者寻求洞察,以了解这是否是已知的知识,还是一项新的探索。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 随机整数的熵是多少? (quomodocumque.wordpress.com) 9 分,作者 sebg,1小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Surely people have thought about this but I couldn’t find anything on a short search.

The question: let n be a random integer, say in [N,2N]. It factors as the product of p_i^a_i. So you can think of the proportion of n that is “made of” the prime p_i to be (a_i log p_i / log n). OK, now this gives you a probability distribution. What’s its entropy?

You can make this a little simpler by restricting to squarefree integers. (Not sure whether this should change the answer.) The sizes of the prime factors of a random squarefree integer are supposed to match the cycle lengths of a random large permutation (say on N letters), so now we can ask that question: break a random permutation up into cycles, which gives another probability distribution; namely, that which assigns each cycle the probability cycle length / N. (This is called the Poisson-Dirichlet process with parameters (0,1). Here’s a Terry T. post about it, and why it governs prime factorizations of random integers.) What’s the entropy of this one?

Well, we can at least guess the average. Let X_i be the number of i-cycles in a random permutation on N letters. Then the X_i are roughly independent Poisson variables of mean 1/i. So there are X_i i-cycles, each appearing with probability i/n, and thus each contributing (-i/N) log (i/N) or (i/N)(log N – log i) to the entropy of the distribution. Now all we have to do is sum over i! You are summing

(i/N) X_i log N – (i/N) X_i log i

over all i. The first sum is easy; the sum of iX_i has to be N, so this is just log N.

What about the subtrahend? This is \frac{1}{N} \sum (i \log i) X_i. Since X_i has expected value 1/i, the expected value of this part should be \frac{1}{N} \sum \log i.

I didn’t tell you how long the sum was! But i certainly can’t be greater than N so let’s stop the sum there. Then the sum of log_i, by Stirling’s formula, is very close to N log N – N. So the second part of the sum is about log N – 1. Almost exact cancellation! Our estimate for the mean entropy is

log N – (log N – 1) = 1.

Is that true? Is it also true for integers? (I did actually run some computations on this and the mean looked less than one, but numbers with only five or six digits are just lousy with primes and near-primes, which could well mess this up.) Does the entropy actually converge to a distribution or does it just have a mean? What about the exponential of the entropy, the so-called perplexity? Does it have a mean? I ask because I tend to think of the perplexity of a probability distribution as roughly “the actual number of possible outcomes if you count them with information in mind.” So you could think of the distribution of perplexity, if there is one, as the information theorist’s version of the Erdos-Kac theorem. “You thought the number of prime factors grew like log log n, Pal and Mark, but come on, some of those primes are so small they barely matter — actually the expected number of prime factors is constant!”

I’m sure this is all well-understood by someone, but as I’ve mentioned before I think a blog is a good place to model the casual way mathematicians think about things in real life, but not always in public.

联系我们 contact @ memedata.com