“Doctor, it hurts when I p” A low-powered study is only going to be able to see a pretty big effect. But sometimes you know that the effect, if it exists, is small. In other words, a study that accurately measures the effect … is likely to be rejected as statistically insignificant, while any result that passes the p < .05 test is either a false positive or a true positive that massively overstates the … effect. … A conventional boundary, obeyed long enough, can be easily mistaken for an actual thing in the world. Imagine if we talked about the state of the economy this way! Economists have a formal definition of a 'recession,' which depends on arbitrary thresholds just as 'statistical significance' does. One doesn't say, 'I don't care about the
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
“Doctor, it hurts when I p”
A low-powered study is only going to be able to see a pretty big effect. But sometimes you know that the effect, if it exists, is small. In other words, a study that accurately measures the effect … is likely to be rejected as statistically insignificant, while any result that passes the p < .05 test is either a false positive or a true positive that massively overstates the … effect.
…
A conventional boundary, obeyed long enough, can be easily mistaken for an actual thing in the world. Imagine if we talked about the state of the economy this way! Economists have a formal definition of a 'recession,' which depends on arbitrary thresholds just as 'statistical significance' does. One doesn't say, 'I don't care about the unemployment rate, or housing starts, or the aggregate burden of student loans, or the federal deficit; if it's not a recession, we're not going to talk about it.' One would be nuts to say so. The critics — and there are more of them, and they are louder, each year — say that a great deal of scientific practice is nuts in just this way.
If anything, this underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:
There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.
Statistical significance doesn’t say that something is important or true. Since there already are far better and more relevant testing that can be done (see e. g. here and here), it is high time to consider what should be the proper function of what has now really become a statistical fetish. Given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape – why continue to press students and researchers to do null hypothesis significance testing, testing that relies on a weird backward logic that students and researchers usually don’t understand?
In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since it can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.
As shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” And — most importantly — we should of course never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. As David Freedman writes in Statistical Models and Causal Inference:
I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.