贝叶斯统计:为困惑的数据科学家而设
Bayesian statistics for confused data scientists

原始链接: https://nchagnet.pages.dev/blog/bayesian-statistics-for-confused-data-scientists/

## 贝叶斯 vs. 频率派统计:总结 本文探讨了贝叶斯统计和频率派统计之间的差异,这两种方法用于解释数据和不确定性。频率派统计是传统方法,将参数视为固定但未知的量,而贝叶斯统计则将参数视为具有自身概率分布的随机变量。 核心区别在于概率的使用方式。贝叶斯统计利用贝叶斯定理 – P(θ|X) = [P(X|θ)P(θ)]/P(X) – 根据观测数据 (X) 更新对参数 (θ) 的信念,通过“先验”分布纳入先验知识。这与频率派方法形成对比,后者通常在参数估计*之后*添加不确定性。贝叶斯方法提供参数可能值的完整分布,通过“可信区间”(参数落在该区间内的可能性)提供对不确定性更直观的理解。 虽然计算上更复杂,但像PyMC和马尔可夫链蒙特卡洛 (MCMC) 方法等工具简化了贝叶斯分析。作者通过一个骰子滚动示例和一个零售销售数据场景来说明这一点,展示了贝叶斯方法如何利用领域知识,并且比频率派方法更稳健地处理稀疏数据。最终,数据科学中常用的正则化技术(如Lasso和Ridge回归)可以理解为贝叶斯先验的应用。 贝叶斯统计擅长模拟内在的不确定性并整合先验知识,使其成为复杂数据分析的强大框架。

## 贝叶斯统计:务实讨论 最近一篇关于贝叶斯统计的文章在Hacker News上引发了讨论,揭示了人们对其在数据科学中应用的细致看法。虽然历史上贝叶斯方法被定位为频率学方法的替代方案,但许多从业者现在更倾向于混合方法,利用来自两种思想流派的技术。 一些评论员指出,他们在职业生涯中成功地运用频率学方法,甚至在多层建模等复杂情况下,并对尝试在特别困难的问题上使用贝叶斯方法时经常遇到的计算挑战(缓慢收敛、长时间运行)表示沮丧。 然而,其他人强调了贝叶斯方法至关重要的情境,尤其是在使用有限数据稳定推断以及解决诸如林德利悖论(频率学方法可能夸大显著性)等问题时。现代概率编程语言,如Stan、Turing和Pyro,为贝叶斯推断提供了强大的工具,并可以缓解收敛问题。 这次对话强调,“最佳”方法高度依赖于具体问题,并且经验各不相同。此外,贝叶斯原理是生成式人工智能许多进步的基础,包括稳定扩散和大型语言模型。
相关文章

原文

It’s the third time I’ve fallen into the Bayesian rabbit hole. It always goes like this: I find some cool article about it, it feels like magic, whoever is writing about it is probably a little smug about how much cooler than frequentism it is (and I don’t blame them), and yet I still leave confused about what exactly is happening. This post is a cathartic attempt to force myself into making sense out of everything I’ve read so far, and hopefully it will also be useful to the legions out there who surely feel the same way as I do.1

Bayesian vs. frequentist statistics: the story of a feud

The frequentist approach is so dominant that when you learn statistics, it’s not named as such, it just is statistics. The Bayesian approach, on the other hand, is this weird niche that only a few people seem reeeeally into. It’s the Haskell of statistics. And just like its programming counterpart, this little tribe of Bayesians is actually right to love it so much.

At its heart, the difference between Bayesian and frequentist statistics is about the philosophical role that probability plays into the framework. In both frameworks, you have parameters (usually some unknown quantities which determine how things behave) and you have data (or observations), which are things you’ve measured.

A simple example would be if you roll a die a bunch of times. The parameter here is the number of faces nn (intuitively, we all know the more faces, the less likely a given face will appear), while the data is just the collected faces you see as you roll the die. Let me tell you right now that for my example to make any sense whatsoever, you have to make the scenario a bit more convoluted. So let’s say you’re playing DnD or some dice-based game, but your game master is rolling the die behind a curtain. So you don’t know how many faces the die has (maybe the game master is lying to you, maybe not), all you know is it’s a die, and the values that are rolled. A frequentist in this situation would tell you the parameter nn is fixed (although unknown), and the data is just randomly drawn from the uniform distribution XU(n)X \sim \mathcal{U}(n)

Michael scott saying What?

I’m going to pause here for you to take a breath and yell at your screen that it makes no sense. Of course, the number of faces is fixed, it’s a die! What Bayesian statistics quantifies with the distribution PP is not how random the number of faces is, but how uncertain you are about it. This is the crucial difference and the whole reason why Bayesian statistics is so powerful. In frequentist approaches, uncertainty is often an afterthought, something you just tack on using some sample-to-population formula after the fact. Maybe if you feel fancy you use some bootstrapping method. And whatever interval you get from this is a confidence interval, it doesn’t tell you how likely the parameter is to be within, but how often the intervals constructed this way will contain the parameter. This is often a confusing point which makes confidence intervals a very misunderstood concept. In Bayesian statistics, on the other hand, the parameter is not a point but a distribution. The spread of that distribution already accounts for the uncertainty you have about the parameter, and the credible interval you get from it actually tells you how likely the parameter is to be within it.

On a more mathematical note, the difference between the two approaches lies within Bayes’ famous theorem which tells you how conditional probabilities relate to each other:

P(AB)P(B)=P(BA)P(A) . P(A|B)P(B) = P(B|A)P(A)~.
联系我们 contact @ memedata.com