**Summary:**

A non-trivial part of teaching statistics to social science students is made up of teaching them to perform significance testing. A problem yours truly has noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests — p-values — really are, still most students misinterpret them. A couple of years ago I gave a statistics course for the Swedish National Research School in History, and at the exam I asked the students to explain how one should correctly interpret p-values. Although the correct definition is p(data|null hypothesis), a majority of the students either misinterpreted the p-value as being the likelihood of a sampling error (which of course is wrong, since the very

**Topics:**

Lars Pålsson Syll considers the following as important: Statistics & Econometrics

**This could be interesting, too:**

Lars Pålsson Syll writes Keynes on the methodology of econometrics

Lars Pålsson Syll writes On the difference between econometrics and data science

Lars Pålsson Syll writes Big data truthiness

Lars Pålsson Syll writes Econometrics and the challenge of regression specification

A non-trivial part of teaching statistics to social science students is made up of teaching them to perform significance testing. A problem yours truly has noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests — p-values — really are, still most students misinterpret them.

A couple of years ago I gave a statistics course for the *Swedish National Research School in History*, and at the exam I asked the students to explain how one should correctly interpret p-values. Although the correct definition is p(data|null hypothesis), a majority of the students either misinterpreted the p-value as being the likelihood of a sampling error (which of course is wrong, since the very computation of the p-value is based on the assumption that sampling errors are what causes the sample statistics not coinciding with the null hypothesis) or that the p-value is the probability of the null hypothesis being true, given the data (which of course also is wrong, since it is p(null hypothesis|data) rather than the correct p(data|null hypothesis)).

This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall pray to the same mistakes. So – given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape – why continue to press students and researchers to do null hypothesis significance testing, testing that relies on weird backward logic that students and researchers usually don’t understand?

Let me just give a simple example to illustrate how slippery it is to deal with p-values – and how easy it is to impute causality to things that really are nothing but chance occurrences.

Say you have collected cross-country data on austerity policies and growth (and let’s assume that you have been able to “control” for possible confounders). You find that countries that have implemented austerity policies have on average increased their growth by say 2% more than the other countries. To really feel sure about the efficacy of the austerity policies you run a significance test – thereby actually assuming without argument that all the values you have come from the same probability distribution – and you get a p-value of less than 0.05. Heureka! You’ve got a statistically significant value. The probability is less than 1/20 that you got this value out of pure stochastic randomness.

But wait a minute. There is – as you may have guessed – a snag. If you test austerity policies in enough many countries you will get a statistically ‘significant’ result out of pure chance 5% of the time. So, really, there is nothing to get so excited about!

Statistical significance doesn’t say that something is important or true. And since there already are far better and more relevant testing that can be done (see e. g. here and here), it is high time to give up on this statistical fetish and not continue to be fooled by randomness.