![]() |
|
![]() |
| In particle physics, it was quite fashionable (and may still be) to iterate on blinded data (data deliberated altered by a secret, random number, and/or relying entirely on Monte Carlo simulation). |
![]() |
| Interesting I wasn’t aware of that. Another thing I’ve only briefly read about is registering studies in advance, and quite literally preventing iteration. |
![]() |
| Iteration is necessary for any analysis. To safeguard yourself from overfitting, be sure to have a hold out dataset that hasn’t been touched until the end. |
![]() |
| I think each category of Bayesian described in the article generally falls under Breiman's [0] "data modeling" culture, while ML practitioners, even when using Bayesian methods, almost invariably fall under the "algorithmic modeling" culture. In particular, the article's definition of pragmatic Bayes says that "the model should be consistent with knowledge about the underlying scientific problem and the data collection process," which I don't consider the norm in ML at all.
I do think ML practitioners in general align with the "iteration" category in my characterization, though you could joke that that miscategorizes people who just use (boosted trees|transformers) for everything. [0] https://projecteuclid.org/journals/statistical-science/volum... |
![]() |
| > the model should be consistent with knowledge about the problem [...] which I don't consider the norm in ML at all.
I don't think that is so niche. Murphy's vol II, a mainstream book, starts with this quote: "Intelligence is not just about pattern recognition and function approximation. It’s about modeling the world." — Josh Tenenbaum, NeurIPS 2021. Goodman & Tenenbaum have written e.g. https://probmods.org, which is very much about modeling data-generating processes. The same can be said about large parts of Murphy's book, Lee & Wagenmakers or Lunn et al. (the BUGS book). |
![]() |
| Funny enough I also heard recently about Fiducial Statistics as a 3rd camp, an intriguing podcast episode 581 of super data science, with the EiC of Harvard Business Review. |
![]() |
| The fact that you are designing an experiment and not trusting it is bonkers. The experiment concludes that the house is haunted and you've already agreed that it would be so before the experiment. |
![]() |
| I've used both in some papers and report two results (why not?). The golden rule in my mind is to fully describe your process and assumptions, then let the reader decide. |
![]() |
| I also thought it was silly, but maybe they mean that frequentist methods still have analytical solutions in some settings where Bayesian methods must resort to Monte Carlo methods? |
![]() |
| Nit: "Bear in mind". "Bare" means "to make bare" (i.e., to uncover); "bear" means "to carry": "As you evaluate this discussion, carry it in your mind that..." |
![]() |
| +1 for “in all likelihood” but it should be stated that the book explains a third approach which doesn’t lean on either subjective or objective probability. |
![]() |
| > So my primary beef with Bayesian statistics (...) is that it can very easily be misused by non-statisticians and beginners
Unlike frequentist statistics? :-) |
![]() |
| Also hard to interpret correctly frequentist results.
-- Misinterpretations of P-values and statistical tests persists among researchers and professionals working with statistics and epidemiology "Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis’ probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists." https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383044/ -- Robust misinterpretation of confidence intervals "Only 8 first-year students (2%), no master students, and 3 postmasters researchers (3%) correctly indicated that all statements were wrong." https://link.springer.com/article/10.3758/s13423-013-0572-3 -- P-Value, Confidence Intervals, and Statistical Inference: A New Dataset of Misinterpretation "The data indicates that 99% subjects have at least 1 wrong answer of P-value understanding (Figure 1A) and 93% subjects have at least 1 wrong answer of CI understanding (Figure 1B)." https://www.frontiersin.org/journals/psychology/articles/10.... |
![]() |
| This is the truly enlightened answer. Pick some reasonably defined concept of it if forced. Mainly though, you notice it works and apply the conventions. |
![]() |
| So here's a sort of hard-nosed answer: probability is just as well-defined as any other mathematics.
> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean? It means X is a random variable from some sample space to a measurable space and P is a probability function. > If so, is it falsifiable? And how? Yes, by calculating P(X) in the given sample space. For example, if X is the event "you get 100 heads in a row when flipping a fair coin" then it is false that P(X) = 0.5. It's a bit like asking whether 2^2 = 4 is falsifiable. There are definitely meaningful questions to ask about whether you've modeled the problem correctly, just as it's meaningful to ask what "2" and "4" mean. But those are separate questions from whether the statements of probability are falsifiable. If you can show that the probability axioms hold for your problem, then you can use probability theory on it. There's a Wikipedia article on interpretations of probability here: https://en.wikipedia.org/wiki/Probability_interpretations. But it is pretty short and doesn't seem quite so complete. |
![]() |
| > I think you haven't thought about this deeply enough yet.
On the contrary, I've thought about it quite deeply. Or at least deeply enough to talk about it in this context. > You take it as self evident that P(X) = 0.5 is false for that event, but how do you prove that? By definition a fair coin is one for which P(H) = P(T) = 1/2. See e.g. https://en.wikipedia.org/wiki/Fair_coin. Fair coins flips are also by definition independent, so you have a series of independent Bernoulli trials. So P(H^k) = P(H)^k = 1/2^k. And P(H^k) != 1/2 unless k = 1. > Assuming you flip a coin and you indeed get 100 heads in a row, does that invalidate the calculated probability? If not, then what would? Why would that invalidate the calculated probability? > If not, then what would? P(X) = 0.5 is a statement about measures on sample spaces. So any proof that P(X) != 0.5 falsifies it. I think what you're really trying to ask is something more like "is there really any such thing as a fair coin?" If you probe that question far enough you eventually get down to quantum computation. But there is some good research on coin flipping. You may like Persi Diaconis's work. For example his Numberphile appearance on coin flipping https://www.youtube.com/watch?v=AYnJv68T3MM |
![]() |
| > By definition a fair coin is one for which P(H) = P(T) = 1/2. See e.g. https://en.wikipedia.org/wiki/Fair_coin.
But that's a circular tautology, isn't it? You say a fair coin is one where the probability of heads or tails are equal. So let's assume the universe of coins is divided into those which are fair, and those which are not. Now, given a coin, how do we determine it is fair? If we toss it 100 times and get all heads, do we conclude it is fair or not? I await your answer. |
![]() |
| The appropriateness of each approach is very much a function of what is being modeled and the corresponding consequences for error. |
![]() |
| An implicit shared belief of all of the practitioners the author mentions is that they attempt to construct models that correspond to some underlying "data generating process". Machine learning practitioners may use similar models or even the same models as Bayesian statisticians, but they tend to evaluate their models primarily or entirely based on their predictive performance, not on intuitions about why the data is taking on the values that it is.
See Breiman's classic "Two Cultures" paper that this post's title is referencing: https://projecteuclid.org/journals/statistical-science/volum... |
![]() |
| At a high level, Bayesian statistics and DL share the same objective of fitting parameters to models.
In particular, variational inference is a family of techniques that makes these kinds of problems computationally tractable. It shows up everywhere from variational autoencoders, to time-series state-space modeling, to reinforcement learning. If you want to learn more, I recommend reading Murphy's textbooks on ML: https://probml.github.io/pml-book/book2.html |
![]() |
| Not sure why this is being downvoted, as it’s mentioned peripherally in the article. I think it’s primary used as an extreme example of a model where the inner mechanism is entirely inscrutable. |
![]() |
| What? Maybe in a very specific context where you are modeling joint distributions of people and traits, but that’s barely a critique of the method itself. |
- Iterating on the functional form of the model (and therefore the assumed underlying data generating process) is generally considered obviously good and necessary, in my experience.
- Priors are usually uninformative or weakly informative, partly because data is often big enough to overwhelm the prior.
The need for iteration feels so obvious to me that the entire "no iteration" column feels like a straw man. But the author, who knows far more academic statisticians than I do, explicitly says that he had the same belief and "was shocked to learn that statisticians didn’t think this way."