The dangers of using pernicious fictions in statistics In much of science and medicine, the assumptions behind standard teaching, terminology, and interpretations of statistics are usually false, and hence the answers they provide to real-world questions are misleading … In light of this harsh reality, we should ask what meaning (if any) can we assign to the P-values, “statistical significance” declarations, “confidence” intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data … My colleagues and I have long argued that inferences and decisions that pivot on formal statistics are misleading when the application context leaves us uncertain about those assumptions [Greenland, 2005, 2017;
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes Interpreting confidence intervals
Lars Pålsson Syll writes What kind of ‘rigour’ do RCTs provide?
Lars Pålsson Syll writes Is the p-value dead?
The dangers of using pernicious fictions in statistics
In much of science and medicine, the assumptions behind standard teaching, terminology, and interpretations of statistics are usually false, and hence the answers they provide to real-world questions are misleading …
In light of this harsh reality, we should ask what meaning (if any) can we assign to the P-values, “statistical significance” declarations, “confidence” intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data … My colleagues and I have long argued that inferences and decisions that pivot on formal statistics are misleading when the application context leaves us uncertain about those assumptions [Greenland, 2005, 2017; Greenland & Lash 2008; Rafi & Greenland, 2020; Amrhein & Greenland, 2022; Greenland et al., 2023]. In those contexts, careful uncertainty analysis will reveal that the stated “significance” and “confidence” levels reflect only naivety about the assumptions needed for those levels to hold [Greenland, 2005; Greenland & Lash, 2008], so that placing 95% confidence on a nominally 95% “confidence” interval is overconfidence [Amrhein et al., 2019a, 2019b; Greenland, 2019a, 2019b].
All science entails human judgment, and using statistical models does not relieve us of that necessity. Working with misspecified models, the scientific value of ‘significance testing’ is actually zero — even though you’re making valid statistical ‘inferences’! Statistical models and concomitant significance tests are no substitutes for doing real science.
In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the ‘null hypothesis’ since it can’t be rejected at the standard 5% significance level. And as shown over and over again when it is applied, people have a tendency to read ‘not disconfirmed’ as ‘probably confirmed.’
The excessive reliance on significance testing in science is disturbing and should be fought. But it is also important to put significance testing abuse in perspective. The real problem in today’s social sciences is not significance testing per se. No, the real problem has to do with the often unqualified and mechanistic application of statistical methods to real-world phenomena without having even the slightest idea of how the assumptions behind the statistical models condition and severely limit the value of the inferences made.
As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we equate randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of — and actually, to be strict, do not at all exist — without specifying such system-contexts.
Accepting a domain of probability theory and a sample space of ‘infinite populations’ — which is legion in modern econometrics — also implies that judgments are made on the basis of observations that are actually never made! Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.
In his marvellous book Statistical Models and Causal Inference: A Dialogue with the Social Sciences David Freedman touched on this fundamental problem, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels:
Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals. However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions. Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …
In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science.