（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41080373

作者讨论了他们对贝叶斯思维分类模型的分歧，该模型根据贝叶斯学派是否更新先验知识并迭代调整模型结构将贝叶斯学派分为四类。他们声称“无迭代”类别是不现实的，因为迭代模型细化已被普遍接受。作者批评了模型的迭代改进可能引入偏差的观点，认为它会阻碍探索，并由于积极结果的压力（称为“发布或灭亡”）而导致模型过于简单化。他们还对分叉路径花园问题提出了担忧，其中分析选择取决于数据，导致不确定性并导致可能的偏差。作者的结论是，虽然迭代不一定是坏事，但目标函数往往无法与科学目标保持一致，这使得实现预期结果具有挑战性。他们认为，科学家可能认为迭代他们的模型是不忠实的，并认为贝叶斯方法的滥用在很大程度上导致了对 p 值和统计显着性的广泛误解。最后，他们深入了解了统计显着性检验的历史和发展，强调它是在计算能力和农业数据收集有限的时代创建的，这极大地影响了它的设计。作者认为，技术的进步需要重新评估传统的统计实践，以应对当前解释结果的挑战。

The author is claiming that Bayesians vary along two axes: (1) whether they generally try to inform their priors with their knowledge or beliefs about the world, and (2) whether they iterate on the functional form of the model based on its goodness-of-fit and the reasonableness and utility of its outputs. He then labels 3 of the 4 resulting combinations as follows:

    ┌───────────────┬───────────┬──────────────┐
    │               │ iteration │ no iteration │
    ├───────────────┼───────────┼──────────────┤
    │ informative   │ pragmatic │ subjective   │
    │ uninformative │     -     │ objective    │
    └───────────────┴───────────┴──────────────┘

My main disagreement with this model is the empty bottom-left box - in fact, I think that's where most self-labeled Bayesians in industry fall:

- Iterating on the functional form of the model (and therefore the assumed underlying data generating process) is generally considered obviously good and necessary, in my experience.

- Priors are usually uninformative or weakly informative, partly because data is often big enough to overwhelm the prior.

The need for iteration feels so obvious to me that the entire "no iteration" column feels like a straw man. But the author, who knows far more academic statisticians than I do, explicitly says that he had the same belief and "was shocked to learn that statisticians didn’t think this way."

The no iteration thing is very real and I don’t think it’s even for particularly bad reasons. We iterate on models to make them better, by some definition of better. It’s no secret that scientific work is subject to rather perverse incentives around thresholds of significance and positive results. Publish or perish. Perverse incentives lead to perverse statistics.

The iteration itself is sometimes viewed directly as a problem. The “garden of forking paths”, where the analysis depends on the data, is viewed as a direct cause for some of the statistical and epistemological crises in science today.

Iteration itself isn’t inherently bad. It’s just that the objective function usually isn’t what we want from a scientific perspective.

To those actually doing scientific work, I suspect iterating on their models feels like they’re doing something unfaithful.

Furthermore, I believe a lot of these issues are strongly related to the flawed epistemological framework which many scientific fields seem to have converged: p<0.05 means it’s true, otherwise it’s false.

edit:

Perhaps another way to characterize this discomfort is by the number of degrees of freedom that the analyst controls. In a Bayesian context where we are picking priors either by belief or previous data, the analyst has a _lot_ of control over how the results come out the other end.

I think this is why fields have trended towards a set of ‘standard’ tests instead of building good statistical models. These take most of the knobs out of the hands of the analyst, and generally are more conservative.

  > Iteration itself isn’t inherently bad. It’s just that the objective
  > function usually isn’t what we want from a scientific perspective.

I think this is exactly right and touches on a key difference between science and engineering.

Science: Is treatment A better than treatment B?

Engineering: I would like to make a better treatment B.

Iteration is harmful for the first goal yet essential for the second. I work in an applied science/engineering field where both perspectives exist. (and are necessary!) Which specific path is taken for any given experiment or analysis will depends on which goal one is trying to achieve. Conflict will sometimes arise when it's not clear which of these two objectives is the important one.

This is how I perceived the difference: >SCIENCE< [a] create a hypothesis [b] collect all the data [c] check the hypothesis and publish; >ENGINEERING< [a] create a hypothesis [b] collect some data [c] refine the hypothesis [d] iterate over [b] and [c] until [e] PROFIT! (and maybe publish someday); the engineering approach is often better funded, allowing more data collection and better validation. If your engineering model is sufficiently deficient your product will be rejected in the market if it can even get to market. If your scientific model is sufficiently deficient, a researcher depending on that model will someday publish a refinement.

There is no difference between comparing A versus B or B1 versus B2. The data collection process and and the mathematical methods are (typically) identical or subject to the same issues.

E.g.: profiling an existing application and tuning its performance is comparing two products, it just so happens that they’re different versions of the same series. If you compared it to a competing vendor’s product you should use the same mathematical analysis process.

I was kind of scratching my head at what GP was getting at as well; I suspect that "better" has a different metric in the second case: i.e., the scientist is asking which chemical A or B has the stronger desired medical effect; the engineer is assuming we're going with chemical B, and trying to drive down cost of producing the chemical or improve lifespan of the pills or decrease discomfort administering or increase absorption speed or tweak the absorption curve or something like that. Those metrics are often much easier to measure than the effectiveness of the chemical itself, and much less scientifically interesting.

In particle physics, it was quite fashionable (and may still be) to iterate on blinded data (data deliberated altered by a secret, random number, and/or relying entirely on Monte Carlo simulation).

Interesting I wasn’t aware of that. Another thing I’ve only briefly read about is registering studies in advance, and quite literally preventing iteration.

Given the set of scientific publication assumptions (predominantly p

That being said, it's completely fair to use cross-validation and then run models on train, iterate with test and then finally calculate p-values with validation.

The problem with that approach is that you need to collect much, much more data than people generally would. Given that most statistical tests were developed for a small data world, this can often work but in some cases (medicine, particularly) it's almost impossible and you need to rely on the much less useful bootstrapping or LOO-CV approaches.

I guess the core problem is that the methods of statistical testing assume no iteration, but actually understanding data requires iteration, so there's a conflict here.

If the scientific industry was OK with EDAs being published to try to tease out work for future experimental studies then we'd see more of this, but it's hard to get an EDA published so everyone does the EDA, and then rewrites the paper as though they'd expected whatever they found from the start, which is the worst of both worlds.

Iteration is necessary for any analysis. To safeguard yourself from overfitting, be sure to have a hold out dataset that hasn’t been touched until the end.

What about automated predictive modeling pipelines? In other words, I want the best possible point estimates only on future data. I’d think, regardless of the model selection process, I want to reestimate the parameters on the entire dataset before I deploy it, so as not to “waste” data? I.e. I want to use the hold out test data in the final model. Is this valid?

> What about automated predictive modeling pipelines? In other words, I want the best possible point estimates only on future data. I’d think, regardless of the model selection process, I want to reestimate the parameters on the entire dataset before I deploy it, so as not to “waste” data? I.e. I want to use the hold out test data in the final model. Is this valid?

Personally, I think that as long as you're generating data constantly (through some kind of software/hardware process), then you'd be well served to keep your sets pure and build the model finally only on data not used in the original process. This is often wildly impractical (and is probably controversial even within the field), but it's safer.

(If you train on the entire internet, this may not be possible also).

As someone who isn't particularly well-versed in Bayesian "stuff". Does Bayesian non-parametric methods fall under "uninformative" + "iteration" approach?

I have a feeling I'm just totally barking up the wrong tree, but don't know where my thinking/understanding is just off.

Interesting, in my experience modern ML runs almost entirely on pragmatic Bayes. You find your ELBO, you choose the latest latent variable du jour that best models your problem domain (these days it's all transformers), and then you start running experiments.

I think each category of Bayesian described in the article generally falls under Breiman's [0] "data modeling" culture, while ML practitioners, even when using Bayesian methods, almost invariably fall under the "algorithmic modeling" culture. In particular, the article's definition of pragmatic Bayes says that "the model should be consistent with knowledge about the underlying scientific problem and the data collection process," which I don't consider the norm in ML at all.

I do think ML practitioners in general align with the "iteration" category in my characterization, though you could joke that that miscategorizes people who just use (boosted trees|transformers) for everything.

[0] https://projecteuclid.org/journals/statistical-science/volum...

> the model should be consistent with knowledge about the problem [...] which I don't consider the norm in ML at all.

I don't think that is so niche. Murphy's vol II, a mainstream book, starts with this quote:

"Intelligence is not just about pattern recognition and function approximation. It’s about modeling the world." — Josh Tenenbaum, NeurIPS 2021.

Goodman & Tenenbaum have written e.g. https://probmods.org, which is very much about modeling data-generating processes.

The same can be said about large parts of Murphy's book, Lee & Wagenmakers or Lunn et al. (the BUGS book).

I miss the college days where professors would argue endlessly on Bayesian vs Frequentist.

The article is very well succinct and even explains why even my Bayesian professors had different approaches to research and analysis. I never knew about the third camp, Pragmatic Bayes, but definitely is in line with a professor's research that was very through on probability fit and the many iteration to get the prior and joint PDF just right.

Andrew Gelman has a very cool talk "Andrew Gelman - Bayes, statistics, and reproducibility (Rutgers, Foundations of Probability)", which I highly recommend for many Data Scientists

Funny enough I also heard recently about Fiducial Statistics as a 3rd camp, an intriguing podcast episode 581 of super data science, with the EiC of Harvard Business Review.

Regarding the frequentist vs bayesian debates, my slightly provocative take on these three cultures is

- subjective Bayes is the strawman that frequentist academics like to attack

- objective Bayes is a naive self-image that many Bayesian academics tend to possess

- pragmatic Bayes is the approach taken by practitioners that actually apply statistics to something (or in Gelman’s terms, do science)

A few things I wish I knew when took Statistics courses at university some 25 or so years ago:

- Statistical significance testing and hypothesis testing are two completely different approaches with different philosophies behind them developed by different groups of people that kinda do the same thing but not quite and textbooks tend to completely blur this distinction out.

- The above approaches were developed in the early 1900s in the context of farms and breweries where 3 things were true - 1) data was extremely limited, often there were only 5 or 6 data points available, 2) there were no electronic computers, so computation was limited to pen and paper and slide rules, and 3) the cost in terms of time and money of running experiments (e.g., planting a crop differently and waiting for harvest) were enormous.

- The majority of classical statistics was focused on two simple questions - 1) what can I reliably say about a population based on a sample taken from it and 2) what can I reliably about the differences between two populations based on the samples taken from each? That's it. An enormous mathematical apparatus was built around answering those two questions in the context of the limitations in point #2.

That was a nice summary.

The data-poor and computation-poor context of old school statistics definitely biased the methods towards the "recipe" approach scientists are supposed to follow mechanically, where each recipe is some predefined sequence of steps, justified based on an analytical approximations to a sampling distribution (given lots of assumptions).

In modern computation-rich days, we can get away from the recipes by using resampling methods (e.g. permutation tests and bootstrap), so we don't need the analytical approximation formulas anymore.

I think there is still room for small sample methods though... it's not like biological and social sciences are dealing with very large samples.

My understanding is that frequentist statistics was developed in response to the Bayesian methodology which was prevalent in the 1800s and which was starting to be perceived as having important flaws. The idea that the invention of Bayesian statistics made frequentist statistics obsolete doesn't quite agree with the historical facts.

> - subjective Bayes is the strawman that frequentist academics like to attack

I don’t get what all the hate for subjective Bayesianism is. It seems the most philosophically defensible approach, in that all it assumes is our own subjective judgements of likelihood, the idea that we can quantify them (however in exactly), and the idea (avoid Dutch books) that we want to be consistent (most people do).

Whereas, objective Bayes is basically subjective Bayes from the viewpoint of an idealised perfectly rational agent - and “perfectly rational” seems philosophically a lot more expensive than anything subjective Bayes relies on.

I see, so academics are frequentists (attackers) or objective Bayes (naive), and the people Doing Science are pragmatic (correct).

The article gave me the same vibe, nice, short set of labels for me to apply as a heuristic.

I never really understood this particular war, I'm a simpleton, A in Stats 101, that's it. I guess I need to bone up on Wikipedia to understand what's going on here more.

Bayes lets you use your priors, which can be very helpful.

I got all riled up when I saw you wrote "correct", I can't really explain why... but I just feel that we need to keep an open mind. These approaches to data are choices at the end of the day... Was Einstein a Bayesian? (spoiler: no)

Using your priors is another way of saying you know something about the problem. It is exceedingly difficult to objectively analyze a dataset without interjecting any bias. There are too many decision points where something needs to be done to massage the data into shape. Priors is just an explicit encoding of some of that knowledge.

> Priors is just an explicit encoding of some of that knowledge.

A classic example is analyzing data on mind reading or ghost detection. Your experiment shows you that your ghost detector has detected a haunting with p < .001. What is the probability the house is haunted?

The fact that you are designing an experiment and not trusting it is bonkers. The experiment concludes that the house is haunted and you've already agreed that it would be so before the experiment.

You're absolutely right, trying to walk a delicate tightrope that doesn't end up with me giving my unfiltered "you're wrong so lets end conversation" response.

Me 6 months ago would have written: "this comment is unhelpful and boring, but honestly, that's slightly unfair to you, as it just made me realize how little help the article is, and it set the tone. is this even a real argument with sides?"

For people who want to improve on this aspect of themselves, like I did for years:

- show, don't tell (ex. here, I made the oddities more explicit, enough that people could reply to me spelling out what I shouldn't.)

- Don't assert anything that wasn't said directly, ex. don't remark on the commenter, or subjective qualities you assess in the comment.

I've used both in some papers and report two results (why not?). The golden rule in my mind is to fully describe your process and assumptions, then let the reader decide.

I understand the war between bayesians and frequentists. Frequentist methods have been misused for over a century now to justify all sorts of pseudoscience and hoaxes (as well as created a fair share of honest mistakes), so it is understandable that people would come forward and claim there must be a better way.

What I don’t understand is the war between naive bayes and pragmatic bayes. If it is real, it seems like the extension of philosophers vs. engineers. Scientists should see value in both. Naive Bayes is important to the philosophy of science, without which there would be a lot of junk science which would go unscrutinized for far to long, and engineers should be able to see the value of philosophers saving them works by debunking wrong science before they start to implement theories which simply will not work in practice.

I’m always puzzled by this because while I come from a country where the frequentist approach generally dominates, the fight with Bayesian basically doesn’t exist. That’s just a bunch of mathematical theories and tools. Just use what’s useful.

I’m still convinced that Americans tend to dislike the frequentist view because it requires a stronger background in mathematics.

I don’t think mathematical ability has much to do with it.

I think it’s useful to break down the anti-Bayesians into statisticians and non-statistician scientists.

The former are mathematically savvy enough to understand bayes but object on philosophical grounds; the later don’t care about the philosophy so much as they feel like an attack on frequentism is an attack on their previous research and they take it personally

This is a reasonable heuristic. I studied in a program that (for both philosophical and practical reasons) questioned whether the Bayesian formalism should be applied as widely as it is. (Which for many people is, basically everywhere.)

There are some cases, that do arise in practice, where you can’t impose a prior, and/or where the “Dutch book” arguments to justify Bayesian decisions don’t apply.

This statement is correct only on a very basic, fundamental sense, but it disregards the research practice. Let's say you're a mathematician who studies analysis or algebra. Sure, technically there is no fundamental reason for constructive logic and classical logic to "compete", you can simply choose whichever one is useful for the problem you're solving, in fact {constructive + lem + choice axioms} will be equivalent to classical math, so why not just study constructive math since it's higher level of abstraction and you can always add those axioms "later" when you have a particular application.

In reality, on a human level, it doesn't work like that because, when you have disagreements on the very foundations of your field, although both camps can agree that their results do follow, the fact that their results (and thus terminology) are incompatible makes it too difficult to research both at the same time. This basically means, practically speaking, you need to be familiar with both, but definitely specialize in one. Which creates hubs of different sorts of math/stats/cs departments etc.

If you're, for example, working on constructive analysis, you'll have to spend tremendous amount of energy on understanding contemporary techniques like localization etc just to work around a basic logical axiom, which is likely irrelevant to a lot of applications. Really, this is like trying to understand the mathematical properties of binary arithmetic (Z/2Z) but day-to-day studying group theory in general. Well, sure Z/2Z is a group, but really you're simply interested in a single, tiny, finite abelian group, but now you need to do a whole bunch of work on non-abelian groups, infinite groups, non-cyclic groups etc just to ignore all those facts.

I would follow but neither Bayesian nor frequentist probabilities are rocket science.

I’m not following your exemple about binary and group theory either. Nobody looks at the properties of binary and stops there. If you are interested in number theory, group theory will be a useful part of your toolbox for sure.

It's because practicioners of one says that the other camp is wrong and question each other's methodologies. And in academia, questioning one's methodology is akin to saying one is dumb.

To understand both camps I summarize like this.

Frequentist statistics has very sound theory but is misapplied by using many heuristics, rule of thumbs and prepared tables. It's very easy to use any method and hack the p-value away to get statistically significant results.

Bayesian statistics has an interesting premise and inference methods, but until recently with the advancements of computing power, it was near impossible to do simulations to validate the complex distributions used, the goodness of fit and so on. And even in the current year, some bayesian statisticians don't question the priors and iterate on their research.

I recommend using methods both whenever it's convenient and fits the problem at hand.

I can attest that the frequentist view is still very much the mainstream here too and fills almost every college curriculum across the United States. You may get one or two Bayesian classes if you're a stats major, but generally it's hypothesis testing, point estimates, etc.

Regardless, the idea that frequentist stats requires a stronger background in mathematics is just flat out silly though, not even sure what you mean by that.

I also thought it was silly, but maybe they mean that frequentist methods still have analytical solutions in some settings where Bayesian methods must resort to Monte Carlo methods?

> I’m still convinced that Americans tend to dislike the frequentist view because it requires a stronger background in mathematics.

The opposite is true. Bayesian approaches require more mathematics. The Bayesian approach is perhaps more similar to PDE where problems are so difficult that the only way we can currently solve them is with numerical methods.

I think the distaste Americans have to frequentists has much more to do with history of science. The Eugenics movement had a massive influence on science in America a and they used frequentist methods to justify (or rather validate) their scientific racism. Authors like Gould brought this up in the 1980s, particularly in relation to factor analysis and intelligence testing, and was kind of proven right when Hernstein and Murray published The Bell Curve in 1994.

The p-hacking exposures of the 1990s only fermented the notion that it is very easy to get away with junk science using frequentest methods to unjustly validate your claims.

That said, frequentists are still the default statistics in social sciences, which ironically is where the damage was the worst.

I’m not actually in any statistician circles (although I did work at a statistical startup that used Kalman Filters in Reykjavík 10 years ago; and I did dropout from learning statistics in University of Iceland).

But what I gathered after moving to Seattle is that Bayesian statistics are a lot more trendy (accepted even) here west of the ocean. Frequentists is very much the default, especially in hypothesis testing, so you are not wrong. However I’m seeing a lot more Bayesian advocacy over here than I did back in Iceland. So I’m not sure my parent is wrong either, that Americans tend to dislike frequentist methods, at least more than Europeans do.

I’m sure there are creative ways to misuse bayesian statistics, although I think it is harder to hide your intentions as you do that. With frequentist approaches your intentions become obscure in the whole mess of computations and at the end of it you get to claim this is a simple “objective” truth because the p value shows The chances of my theory being true given this data is greater than 95% (or was it chances of getting this data given my theory?). In reality most hoaxes and junk science was because of bad data which didn’t get scrutinized until much too late (this is what Gould did).

But I think the crux of the matter is that bad science has been demonstrated with frequentists and is now a part of our history. So people must either find a way to fix the frequentist approaches or throw it out for something different. Bayesian statistics is that something different.

> "The chances of my theory being true given this data is greater than 95% (or was it chances of getting this data given my theory?)"

The first statement assumes that parameters (i.e. a state of nature) are random variables. That's the Bayesan approach. The second statement assumes that parameters are fixed values, not random, but unknown. That's the frequentist approach.

I never liked the clubs you were expected to put yourself in, what "side" you were on, or the idea that problems in science that we see today could somehow be reduced to the inferential philosophy you adopt. In a lot of ways I see myself as information-theoretic in orientation, so maybe objective Bayesian, although it's really neither frequentist nor Bayesian.

This three cultures idea is a bit of slight of hand in my opinion, as the "pragmatic" culture isn't really exclusive of subjective or objective Bayesianism and in that sense says nothing about how you should approach prior specification or interpretation or anything. Maybe Gelman would say a better term is "flexibility" or something but then that leaves the question of when you go objective and when you go subjective and why. Seems better to formalize that than leave it as a bit of smoke and mirrors. I'm not saying some flexibility about prior interpretation and specification isn't a good idea, just that I'm not sure that approaching theoretical basics with the answer "we'll just ignore the issues and pretend we're doing something different" is quite the right answer.

Playing a bit of devil's advocate too, the "pragmatic" culture reveals a bit about why Bayesianism is looked at with a bit of skepticism and doubt. "Choosing a prior" followed by "seeing how well everything fits" and then "repeating" looks a lot like model tweaking or p-hacking. I know that's not the intent, and it's impossible to do modeling without tweaking, but if you approach things that way, the prior just looks like one more degree of freedom to nudge things around and fish with.

I've published and edited papers on Bayesian inference, and my feeling is that the problems with it have never been in the theory, which is solid. It's in how people use and abuse it in practice.

Bare in mind that Breiman's polemic was about generative vs discriminative methods. I.e. that we should not start an analysis by thinking about how the data generation can be modelled, but instead we should start with prediction. From that vein came boosted trees, bagging, random forests, xgboost and so on: non generative black box methods.

Still today most of the classical machine learning toolbox is not generative.

Nit: "Bear in mind". "Bare" means "to make bare" (i.e., to uncover); "bear" means "to carry": "As you evaluate this discussion, carry it in your mind that..."

If you want to get an informed opinion on modern Frequentist methods check out the book "In All Likelihood" by Yudi Pawitawn.

In an early chapter it outlines, rather eloquently, the distinctions between the Frequentist and Bayesian paradigms and in particular the power of well-designed Frequentist or likelihood-based models. With few exceptions, an analyst should get the same answer using a Bayesian vs. Frequentist model if the Bayesian is actually using uninformative priors. In the worlds I work in, 99% of the time I see researchers using Bayesian methods they are also claiming to use uninformative priors, which makes me wonder if they are just using Bayesian methods to sound cool and skip through peer review.

One potential problem with Bayesian statistics lies in the fact that for complicated models (100s or even 1000s of parameters) it can be extremely difficult to know if the priors are truly uninformative in the context of a particular dataset. One has to wait for models to run, and when systematically changing priors this can take an extraordinary amount of time, even when using high powered computing resources. Additionally, in the Bayesian setting it becomes easy to accidentally "glue" a model together with a prior or set of priors that would simply bomb out and give a non-positive definite hessian in the Frequentist world (read: a diagnostic telling you that your model is likely bogus and/or too complex for a given dataset). One might scoff at models of this complexity, but that is the reality in many applied settings, for example spatio-temporal models facing the "big n" problem or for stuff like integrated fisheries assessment models used to assess status and provide information on stock sustainability.

So my primary beef with Bayesian statistics (and I say this as someone who teaches graduate level courses on the Bayesian inference) is that it can very easily be misused by non-statisticians and beginners, particularly given the extremely flexible software programs that currently are available to non-statisticians like biologists etc. In general though, both paradigms are subjective and Gelman's argument that it is turtles (i.e., subjectivity) all the way down is spot on and really resonates with me.

+1 for “in all likelihood” but it should be stated that the book explains a third approach which doesn’t lean on either subjective or objective probability.

> So my primary beef with Bayesian statistics (...) is that it can very easily be misused by non-statisticians and beginners

Unlike frequentist statistics? :-)

Also hard to interpret correctly frequentist results.

--

Misinterpretations of P-values and statistical tests persists among researchers and professionals working with statistics and epidemiology

"Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis’ probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383044/

--

Robust misinterpretation of confidence intervals

"Only 8 first-year students (2%), no master students, and 3 postmasters researchers (3%) correctly indicated that all statements were wrong."

https://link.springer.com/article/10.3758/s13423-013-0572-3

--

P-Value, Confidence Intervals, and Statistical Inference: A New Dataset of Misinterpretation

"The data indicates that 99% subjects have at least 1 wrong answer of P-value understanding (Figure 1A) and 93% subjects have at least 1 wrong answer of CI understanding (Figure 1B)."

https://www.frontiersin.org/journals/psychology/articles/10....

Oh it happens all the time. I've been in several lab meetings where the experiment was redesigned because the results came out "wrong." I.e. the (frequentist) statistics didn't match with the (implicit) prior.

I agree totally.

But it's also a statistics problem because ethically you should incorporate your assumptions into the model. If the assumptions are statistical, then you can incorporate them in a prior.

I mean, the biggest assumptions that most influence the inferences one makes are rarely "statistical" in the sense that they can actually be incorporated in a particular analysis via a prior. They tend to be structural assumptions that represent some fundamental limit to your current state of knowledge, no? Certainly this is domain-specific, though.

I once read a Gelman blog post or paper that argued Frequentists should be more Frequentist (i.e., repeat experiments more often than they currently do) and Bayesians should be more Bayesian (i.e., be more willing to use informative priors and or make probability statements beyond 95% credible intervals). Or something like that, as I am paraphrasing. That always seemed reasonable. Either way, the dueling--and highly simplified--caricatures of Bayesians vs. Frequentists vs. likelihood folks is largely silly to me. Use the tool that works best for the job at hand, and if you can answer a problem effectively with a well designed experiment and a t-test so be it.

So my theory is that probability is an ill-defined, unfalsifiable concept. And yet, it _seems_ to model aspects of the world pretty well, empirically. However, might it be leading us astray?

Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean? It it a proposition? If so, is it falsifiable? And how?

If it is not a proposition, what does it actually mean? If someone with more knowledge can chime in here, I'd be grateful. I've got much more to say on this, but only after I hear from those with a rigorous grounding the theory.

> So my theory is that probability is an ill-defined, unfalsifiable concept. And yet, it seems to model aspects of the world pretty well, empirically.

I have privately come to the conclusion that probability is a well-defined and testable concept only in settings where we can argue from certain exact symmetries. This is the case in coin tosses, games of chance and many problems in statistical physics. On the other hand, in real-world inference, prediction and estimation, probability is subjective and much less quantifiable than statisticians (Bayesians included) would like it to be.

> However, might it be leading us astray?

Yes, I think so. I increasingly feel that all sciences that rely on statistical hypothesis testing as their primary empirical method are basically giant heaps of garbage, and the Reproduciblity Crisis is only the tip of the iceberg. This includes economics, social psychology, large swathes of medical science, data science, etc.

> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean? It it a proposition? If so, is it falsifiable? And how?

I'd say it is an unfalsifiable proposition in most cases. Even if you can run lots of cheap experiments, like with coin tosses, a million runs will "confirm" the calculated probability only with ~1% precision. This is just lousy by the standards of the exact sciences, and it only goes downhill if your assumptions are less solid, the sample space more complex, or reproducibility more expensive.

As a mathematical theory, probability is well-defined. It is an application of a larger topic called measure theory, which also gives us the theoretical underpinnings for calculus.

Every probability is defined in terms of three things: a set, a set of subsets of that set (in plain language: a way of grouping things together), and a function which maps the subsets to numbers between 0 and 1. To be valid, the set of subsets, aka the events, need to satisfy additional rules.

All your example p(X) = 0.5 says is that some function assigns the value of 0.5 to some subset which you've called X.

That it seems to be good at modelling the real world can be attributed to the origins of the theory: it didn't arise ex nihilo, it was constructed exactly because it was desirable to formalize a model for seemingly random events in the real world.

> So my theory is that probability is an ill-defined, unfalsifiable concept

Probability isn’t a single concept, it is a family of related concepts - epistemic probability (as in subjective Bayesianism) is a different concept from frequentist probability - albeit obviously related in some ways. It is unsurprising that a term looks like an “ill-defined, unfalsifiable concept” if you are mushing together mutually incompatible definitions of it.

> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean?

From a subjective Bayesian perspective, p(X) is a measure of how much confidence I - or any other specified person - have in the truth of a proposition, or my own judgement of the weight of evidence for or against it, or my judgement of the degree of my own knowledge of its truth or falsehood. And 0.5 means I have zero confidence either way, I have zero evidence either way (or else, the evidence on each side perfectly cancels each other out), I have a complete lack of knowledge as to whether the proposition is true.

> It it a proposition?

It is a proposition just in the same sense that “the Pope believes that God exists” is a proposition. Whether or not God actually exists, it seems very likely true that the Pope believes he does

> If so, is it falsifiable? And how?

And obviously that’s falsifiable, in the same sense that claims about my own beliefs are trivially falsifiable by me, using my introspection. And claims about other people’s beliefs are also falsifiable, if we ask them, and if assuming they are happy to answer, and we have no good reason to think they are being untruthful.

So you response actually strengthens my point, rather than rebuts it.

> From a subjective Bayesian perspective, p(X) is a measure of how much confidence I - or any other specified person - have in the truth of a proposition, or my own judgement of the weight of evidence for or against it, or my judgement of the degree of my own knowledge of its truth or falsehood.

See how inexact and vague all these measures are. How do you know your confidence is (or should be) 0.5 ( and not 0.49) for example? Or, how to know you have judged correctly the weight of evidence? Or how do you know the transition from "knowledge about this event" to "what it indicates about its probability" you make in your mind is valid? You cannot disprove these things, can you?

Unless you you want to say the actual values do not actually matter, but the way the probabilities are updated in the face of new information is. But in any case, the significance of new evidence still has to be interpreted; there is no objective interpretation, is there?.

> See how inexact and vague all these measures are. How do you know your confidence is (or should be) 0.5 ( and not 0.49) for example?

Well, you don't, but does it matter? The idea is it is an estimate.

Let me put it this way: we all informally engage in reasoning about how likely it is (given the evidence available to us) that a given proposition is true. The idea is that assigning a numerical estimate to our sense of likelihood can (sometimes) be a helpful tool in carrying out reasoning. I might think "X is slightly more likely than ~X", but do I know whether (for me) p(X) = 0.51 or 0.501 or 0.52? Probably not. But I don't need a precise estimate for an estimate to be helpful. And that's true in many other fields, including things that have nothing to do with probability – "he's about six feet tall" can be useful information even though it isn't accurate to the millimetre.

> Or, how to know you have judged correctly the weight of evidence?

That (largely) doesn't matter from a subjective Bayesian perspective. Epistemic probabilities are just an attempt to numerically estimate the outcome of my own process of weighing the evidence – how "correctly" I've performed that process (per any given standard of correctness) doesn't change the actual result.

From an objective Bayesian perspective, it does – since objective Bayesianism is about, not any individual's actual sense of likelihood, rather what sense of likelihood they ought to have (in that evidential situation), what an idealised perfectly rational agent ought to have (in that evidential situation). But that's arguably a different definition of probability from the subjective Bayesian, so even if you can poke holes in that definition, those holes don't apply to the subjective Bayesian definition.

> Or how do you know the transition from "knowledge about this event" to "what it indicates about its probability" you make in your mind is valid?

I feel like you are mixing up subjective Bayesianism and objective Bayesianism and failing to carefully distinguish them in your argument.

> But in any case, the significance of new evidence still has to be interpreted; there is no objective interpretation, is there?.

Well, objective Bayesianism requires there be some objective standard of rationality, subjective Bayesianism doesn't (or, to the extent that it does, the kind of objective rationality it requires is a lot weaker, mere avoidance of blatant inconsistency, and the minimal degree of rationality needed to coherently engage in discourse and mathematics.)

> What does this actually mean? It it a proposition? If so, is it falsifiable? And how?

If you saw a sequence of 1000 coin tosses at say 99% heads and 1% tails, you were convinced that the same process is being used for all the tosses and you had an opportunity to bet on tails with 50% stakes, would you do it?

This is a pragmatic answer which rejects P(X)=0.5. We can try to make sense of this pragmatic decision with some theory. (Incidentally, being exactly 0.5 is almost impossible, it makes more sense to verify if it is an interval like (0.49,0.51)).

The CLT says that probability of X can be obtained by conducting independent trials and the in limit, the average number of times X occurs will approach p(X).

However, 'limit' implies an infinite number of trials, so any initial sequence doesn't determine the limit. You would have to choose a large N as a cutoff and then take the average.

But, is this unique to probability? If you take any statement about the world, "There is a tree in place G", and you have a process to check the statement ("go to G and look for a tree"), can you definitely say that the process will successfully determine if the statement is true? There will always be obstacles("false appearances of a tree" etc.). To rule out all such obstacles, you would have to posit an idealized observation process.

For probability checking, an idealization which works is infinite independent observations which gives us p(X).

PS: I am not trying to favour frequentism as such, just that the requirement of an ideal of observation process shouldn't be considered as an overwhelming obstacle. (Sometimes, the obstacles can become 'obstacles in principle' like position/momentum simultaneous observation in QM and if you had such obstacles, then indeed one can abandon the concept of probability).

You’re right that a particular claim like p(X=x)=a can’t be falsified in general. But whole functions p can be compared and we can say one fits the data better than another.

For example, say Nate Silver and Andrew Gelman both publish probabilities for the outcomes of all the races in the election in November. After the election results are in, we can’t say any individual probability was right or wrong. But we will be able to say whether Nate Silver or Andrew Gelman was more accurate.

This is the truly enlightened answer. Pick some reasonably defined concept of it if forced. Mainly though, you notice it works and apply the conventions.

So here's a sort of hard-nosed answer: probability is just as well-defined as any other mathematics.

> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean?

It means X is a random variable from some sample space to a measurable space and P is a probability function.

> If so, is it falsifiable? And how?

Yes, by calculating P(X) in the given sample space. For example, if X is the event "you get 100 heads in a row when flipping a fair coin" then it is false that P(X) = 0.5.

It's a bit like asking whether 2^2 = 4 is falsifiable.

There are definitely meaningful questions to ask about whether you've modeled the problem correctly, just as it's meaningful to ask what "2" and "4" mean. But those are separate questions from whether the statements of probability are falsifiable. If you can show that the probability axioms hold for your problem, then you can use probability theory on it.

There's a Wikipedia article on interpretations of probability here: https://en.wikipedia.org/wiki/Probability_interpretations. But it is pretty short and doesn't seem quite so complete.

> For example, if X is the event "you get 100 heads in a row when flipping a fair coin" then it is false that P(X) = 0.5

I think you haven't thought about this deeply enough yet. You take it as self evident that P(X) = 0.5 is false for that event, but how do you prove that? Assuming you flip a coin and you indeed get 100 heads in a row, does that invalidate the calculated probability? If not, then what would?

I guess what I'm driving at is this notion (already noted by others) that probability is recursive. If we say p(X) = 0.7, we mean the probability is high that in a large number of trials, X occurs 70% of the time. Or that the proportion of times that X occurs tends to 70% with high probability as the number of trials increase. Note that this second order probability can be expressed with another probability ad infinitum.

> I think you haven't thought about this deeply enough yet.

On the contrary, I've thought about it quite deeply. Or at least deeply enough to talk about it in this context.

> You take it as self evident that P(X) = 0.5 is false for that event, but how do you prove that?

By definition a fair coin is one for which P(H) = P(T) = 1/2. See e.g. https://en.wikipedia.org/wiki/Fair_coin. Fair coins flips are also by definition independent, so you have a series of independent Bernoulli trials. So P(H^k) = P(H)^k = 1/2^k. And P(H^k) != 1/2 unless k = 1.

> Assuming you flip a coin and you indeed get 100 heads in a row, does that invalidate the calculated probability? If not, then what would?

Why would that invalidate the calculated probability?

> If not, then what would?

P(X) = 0.5 is a statement about measures on sample spaces. So any proof that P(X) != 0.5 falsifies it.

I think what you're really trying to ask is something more like "is there really any such thing as a fair coin?" If you probe that question far enough you eventually get down to quantum computation.

But there is some good research on coin flipping. You may like Persi Diaconis's work. For example his Numberphile appearance on coin flipping https://www.youtube.com/watch?v=AYnJv68T3MM

> By definition a fair coin is one for which P(H) = P(T) = 1/2. See e.g. https://en.wikipedia.org/wiki/Fair_coin.

But that's a circular tautology, isn't it?

You say a fair coin is one where the probability of heads or tails are equal. So let's assume the universe of coins is divided into those which are fair, and those which are not. Now, given a coin, how do we determine it is fair?

If we toss it 100 times and get all heads, do we conclude it is fair or not? I await your answer.

> But that's a circular tautology, isn't it?

No it's not a tautology... it's a definition of fairness.

> If we toss it 100 times and get all heads, do we conclude it is fair or not?

This is covered in any elementary stats or probability book.

> Now, given a coin, how do we determine it is fair?

I addressed this in my last two paragraphs. There's a literature on it and you may enjoy it. But it's not about whether statistics is falsifiable, it's about the physics of coin tossing.

> This is covered in any elementary stats or probability book.

No, it is really not. That you are avoiding giving me a straightforward answer says a lot. If you mean this:

> So any proof that P(X) != 0.5 falsifies it

Then the fact that we got all heads does not prove P(X) != 0.5. We could get a billions heads and still that is not proof that P(X) != 0.5 (although it is evidence in favor of it).

> I addressed this in my last two paragraphs...

No you did not. Again you are avoiding giving a straightforward answer. That tell me you are aware of the paradox and are simply avoiding grappling with it.

I think ants_everywhere's statement was misinterpreted. I don't think they meant that flipping 100 heads in a row proves the coin is not fair. They meant that if the coin is fair, the chance of flipping heads 100 times in a row is not 50%. (And that is of course true; I'm not really sure it contributes to the discussion, but it's true).

ants_everywhere is also correct that the coin-fairness calculation is something you can find in textbooks. It's example 2.1 in "Data analysis: a bayesian tutorial" by D S Sivia. What it shows is that after many coin flips, the probability for the bias of a coin-flip converges to roughly a gaussian around the observed ratio of heads and tails, where the width of that gaussian narrows as more flips are accumulated. It depends on the prior as well, but with enough flips it will overwhelm any initial prior confidence that the coin was fair.

The probability is nonzero everywhere (except P(H) = 0 and P(H) = 1, assuming both heads and tails were observed at least once), so no particular ratio is ever completely falsified.

The great thing about Bayesian statistics is that it's subjective. You don't have to be in the subjectivist school. You can choose your own interpretation based on your (subjective) judgment.

I think this is a strength of Bayesianism. Any statistical work is infused with the subjective judgement of individual humans. I think it is more objective to not shy away from this immutable fact.

The appropriateness of each approach is very much a function of what is being modeled and the corresponding consequences for error.

Of course. The best approach for a particular problem depends on your best judgment.

I guess that means I'm in the pragmatist school in this article's nomenclature (I'm a big fan of Gelman and all the other stats folks there), but what one thinks is pragmatic is also subjective.

An implicit shared belief of all of the practitioners the author mentions is that they attempt to construct models that correspond to some underlying "data generating process". Machine learning practitioners may use similar models or even the same models as Bayesian statisticians, but they tend to evaluate their models primarily or entirely based on their predictive performance, not on intuitions about why the data is taking on the values that it is.

See Breiman's classic "Two Cultures" paper that this post's title is referencing: https://projecteuclid.org/journals/statistical-science/volum...

Most models are derived of Machine Learning principles that are a mix of classic probability theory, Frequentist and Bayesian statistics and lots of Computer Science fundamentals. But there have been advancements in Bayesian Inference and Bayesian Deep Learning, you should check the work of frameworks like Pyro (built on top of PyTorch)

Edit: corrected my sentence, but see 0xdde reply for better info.

I stand corrected! It was my impression that many methods used in ML such as Support Vector Machines, Decision Trees, Random Forests, Boosting, Bagging and so on have very deep roots in Frequentist Methods, although current CS implementations lean heavily on optimizations such as Gradient Descent.

Giving a cursory look into Bishop's book I see that I am wrong, as there's deep root in Bayesian Inference as well.

On another note, I find it very interesting that there's not a bigger emphasis on using the correct distributions in ML models, as the methods are much more concerned in optimizing objective functions.

At a high level, Bayesian statistics and DL share the same objective of fitting parameters to models.

In particular, variational inference is a family of techniques that makes these kinds of problems computationally tractable. It shows up everywhere from variational autoencoders, to time-series state-space modeling, to reinforcement learning.

If you want to learn more, I recommend reading Murphy's textbooks on ML: https://probml.github.io/pml-book/book2.html

A (deep) NN is just a really complicated data model, the way one treats the estimation of its parameters and prediction of new data determines whether one is a Bayesian or a frequentist. The Bayesian assigns a distribution to the parameters and then conditions on the data to obtain a posterior distribution based on which a posterior predictive distribution is obtained for new data, while the frequentist treats parameters as fixed quantities and estimates them from the likelihood alone, e.g., with maximum likelihood (potentially using some hacks such as regularization, which themselves can be given a Bayesian interpretation).

Not sure why this is being downvoted, as it’s mentioned peripherally in the article. I think it’s primary used as an extreme example of a model where the inner mechanism is entirely inscrutable.

> Subjective Bayes

> I’m not sure if anyone ever followed this philosophy strictly, nor do I know if anyone would register their affiliation as subjective Bayesian these days.

lol the lesswrong/rationalist "Bayesians" do this all the time.

* I have priors

* YOU have biases

* HE is a toxoplasmotic culture warrior

A Bayesian analysis lets you see how the posterior varies as a function of the prior, instead of forcing you to pick a prior before you start.

The tighter the range of this function, the more confidence you have in the result.

You can never know anything if you absolutely refuse to have a prior, because that gives division by 0 in the posterior.

What? Maybe in a very specific context where you are modeling joint distributions of people and traits, but that’s barely a critique of the method itself.

（评论） (comments)

（评论）
(comments)