Monday , June 1 2020
Home / Lars P. Syll / It’s not just p = 0.048 vs. p = 0.052

# It’s not just p = 0.048 vs. p = 0.052

Summary:
It’s not just p = 0.048 vs. p = 0.052 “[G]iven the realities of real-world research, it seems goofy to say that a result with, say, only a 4.8% probability of happening by chance is “significant,” while if the result had a 5.2% probability of happening by chance it is “not significant.” Uncertainty is a continuum, not a black-and-white difference” … My problem with the 0.048 vs. 0.052 thing is that it way, way, way understates the problem. Yes, there’s no stable difference between p = 0.048 and p = 0.052. But there’s also no stable difference between p = 0.2 (which is considered non-statistically significant by just about everyone) and p = 0.005 (which is typically considered very strong evidence) … If these two p-values come from two identical

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Lars Pålsson Syll writes Haavelmo and modern probabilistic econometrics — a critical-realist perspective (wonkish)

Lars Pålsson Syll writes Causal inference (student stuff)

Lars Pålsson Syll writes Read my lips — using an RCT guarantees nothing!

Lars Pålsson Syll writes ‘Doctor, it hurts when I p’

## It’s not just p = 0.048 vs. p = 0.052

“[G]iven the realities of real-world research, it seems goofy to say that a result with, say, only a 4.8% probability of happening by chance is “significant,” while if the result had a 5.2% probability of happening by chance it is “not significant.” Uncertainty is a continuum, not a black-and-white difference” …

My problem with the 0.048 vs. 0.052 thing is that it way, way, way understates the problem.

Yes, there’s no stable difference between p = 0.048 and p = 0.052.

But there’s also no stable difference between p = 0.2 (which is considered non-statistically significant by just about everyone) and p = 0.005 (which is typically considered very strong evidence) …

If these two p-values come from two identical experiments, then the standard error of their difference is sqrt(2) times the standard error of each individual estimate, hence that difference in p-values itself is only (2.81 – 1.28)/sqrt(2) = 1.1 standard errors away from zero …

So. Yes, it seems goofy to draw a bright line between p = 0.048 and p = 0.052. But it’s also goofy to draw a bright line between p = 0.2 and p = 0.005. There’s a lot less information in these p-values than people seem to think.

So, when we say that the difference between “significant” and “not significant” is not itself statistically significant, “we are not merely making the commonplace observation that any particular threshold is arbitrary—for example, only a small change is required to move an estimate from a 5.1% significance level to 4.9%, thus moving it into statistical significance. Rather, we are pointing out that even large changes in significance levels can correspond to small, nonsignificant changes in the underlying quantities.”

Andrew Gelman

Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.