贝叶斯统计:为困惑的数据科学家而设
Bayesian statistics for confused data scientists

原始链接: https://nchagnet.pages.dev/blog/bayesian-statistics-for-confused-data-scientists/

## 贝叶斯 vs. 频率派统计:总结 本文探讨了贝叶斯统计和频率派统计之间的差异,这两种方法用于解释数据和不确定性。频率派统计是传统方法,将参数视为固定但未知的量,而贝叶斯统计则将参数视为具有自身概率分布的随机变量。 核心区别在于概率的使用方式。贝叶斯统计利用贝叶斯定理 – P(θ|X) = [P(X|θ)P(θ)]/P(X) – 根据观测数据 (X) 更新对参数 (θ) 的信念,通过“先验”分布纳入先验知识。这与频率派方法形成对比,后者通常在参数估计*之后*添加不确定性。贝叶斯方法提供参数可能值的完整分布,通过“可信区间”(参数落在该区间内的可能性)提供对不确定性更直观的理解。 虽然计算上更复杂,但像PyMC和马尔可夫链蒙特卡洛 (MCMC) 方法等工具简化了贝叶斯分析。作者通过一个骰子滚动示例和一个零售销售数据场景来说明这一点,展示了贝叶斯方法如何利用领域知识,并且比频率派方法更稳健地处理稀疏数据。最终,数据科学中常用的正则化技术(如Lasso和Ridge回归)可以理解为贝叶斯先验的应用。 贝叶斯统计擅长模拟内在的不确定性并整合先验知识,使其成为复杂数据分析的强大框架。

## 贝叶斯统计与数据科学:摘要 这次Hacker News讨论的中心是贝叶斯统计在数据科学领域的实际应用及其 perceived usefulness(感知到的实用性)。虽然承认贝叶斯方法在理论上的优势,许多评论者表达了对它们在处理复杂现实世界问题时的计算需求的 frustration(沮丧),例如 convergence 缓慢、运行时间长。 几位经验丰富的统计学家表示,他们成功地使用 frequentist 方法数十年,而无需诉诸贝叶斯方法。 然而,也有人支持贝叶斯技术,尤其是在 multilevel modeling(多层建模)等场景中,其中 frequentist 方法可能不稳定。他们强调了 shrinking effect sizes(缩小效应量)以获得稳健估计的优势,并指出 Stan 和 PyMC 等工具正在提高计算的可行性。 一个关键点是, frequentist/Bayesian 的分歧往往归结为计算的实用性和认知偏好。 许多人认为,现代 ML,尤其是 generative AI(生成式人工智能),implicitly relies on(隐式依赖)贝叶斯原理(例如正则化中的 priors(先验)),并且转向 probabilistic thinking(概率性思维)是有价值的。最终,共识倾向于将两种方法视为工具,根据具体问题和可用资源选择最合适的。
相关文章

原文

It’s the third time I’ve fallen into the Bayesian rabbit hole. It always goes like this: I find some cool article about it, it feels like magic, whoever is writing about it is probably a little smug about how much cooler than frequentism it is (and I don’t blame them), and yet I still leave confused about what exactly is happening. This post is a cathartic attempt to force myself into making sense out of everything I’ve read so far, and hopefully it will also be useful to the legions out there who surely feel the same way as I do.1

Bayesian vs. frequentist statistics: the story of a feud

The frequentist approach is so dominant that when you learn statistics, it’s not named as such, it just is statistics. The Bayesian approach, on the other hand, is this weird niche that only a few people seem reeeeally into. It’s the Haskell of statistics. And just like its programming counterpart, this little tribe of Bayesians is actually right to love it so much.

At its heart, the difference between Bayesian and frequentist statistics is about the philosophical role that probability plays into the framework. In both frameworks, you have parameters (usually some unknown quantities which determine how things behave) and you have data (or observations), which are things you’ve measured.

A simple example would be if you roll a die a bunch of times. The parameter here is the number of faces nn (intuitively, we all know the more faces, the less likely a given face will appear), while the data is just the collected faces you see as you roll the die. Let me tell you right now that for my example to make any sense whatsoever, you have to make the scenario a bit more convoluted. So let’s say you’re playing DnD or some dice-based game, but your game master is rolling the die behind a curtain. So you don’t know how many faces the die has (maybe the game master is lying to you, maybe not), all you know is it’s a die, and the values that are rolled. A frequentist in this situation would tell you the parameter nn is fixed (although unknown), and the data is just randomly drawn from the uniform distribution XU(n)X \sim \mathcal{U}(n)

Michael scott saying What?

I’m going to pause here for you to take a breath and yell at your screen that it makes no sense. Of course, the number of faces is fixed, it’s a die! What Bayesian statistics quantifies with the distribution PP is not how random the number of faces is, but how uncertain you are about it. This is the crucial difference and the whole reason why Bayesian statistics is so powerful. In frequentist approaches, uncertainty is often an afterthought, something you just tack on using some sample-to-population formula after the fact. Maybe if you feel fancy you use some bootstrapping method. And whatever interval you get from this is a confidence interval, it doesn’t tell you how likely the parameter is to be within, but how often the intervals constructed this way will contain the parameter. This is often a confusing point which makes confidence intervals a very misunderstood concept. In Bayesian statistics, on the other hand, the parameter is not a point but a distribution. The spread of that distribution already accounts for the uncertainty you have about the parameter, and the credible interval you get from it actually tells you how likely the parameter is to be within it.

On a more mathematical note, the difference between the two approaches lies within Bayes’ famous theorem which tells you how conditional probabilities relate to each other:

P(AB)P(B)=P(BA)P(A) . P(A|B)P(B) = P(B|A)P(A)~.
联系我们 contact @ memedata.com