Tuesday , April 30 2024
Home / Lars P. Syll / P-hacking and data dredging

P-hacking and data dredging

Summary:
P-hacking and data dredging P-hacking refers to when you massage your data and analysis methods until your result reaches a statistically significant p-value. I will put it to you that in practice most p-hacking is not necessarily about hacking p-s but about dredging your data until your results fit a particular pattern. That may be something you predicted but didn’t find or could even just be some chance finding that looked interesting and is amplified this way. However, the p-value is usually probably secondary to the act here. The end result may very well be the same in that you continue abusing the data until a finding becomes significant, but I would bet that in most cases what matters to people is not the p-value but the result. Moreover, while

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Lars Pålsson Syll writes The importance of ‘causal spread’

Lars Pålsson Syll writes Applied econometrics — a messy business

Lars Pålsson Syll writes Feynman’s trick (student stuff)

Lars Pålsson Syll writes Difference in Differences (student stuff)

P-hacking and data dredging

P-hacking and data dredgingP-hacking refers to when you massage your data and analysis methods until your result reaches a statistically significant p-value. I will put it to you that in practice most p-hacking is not necessarily about hacking p-s but about dredging your data until your results fit a particular pattern. That may be something you predicted but didn’t find or could even just be some chance finding that looked interesting and is amplified this way. However, the p-value is usually probably secondary to the act here. The end result may very well be the same in that you continue abusing the data until a finding becomes significant, but I would bet that in most cases what matters to people is not the p-value but the result. Moreover, while null-hypothesis significance testing with p-values is still by far the most widespread way to make inferences about results, it is not the only way. All this fussing about p-hacking glosses over the fact that the same analytic flexibility or data dredging can be applied to any inference, whether it is based on p-values, confidence intervals, Bayes factors, posterior probabilities, or simple summary statistics …

Everybody p-hacks if left to their own devices. Preregistration and open data can help protect yourself against your mind’s natural tendency to perceive patterns in noise. A scientist’s training is all about developing techniques to counteract this tendency, and so open practices are just another tool for achieving that purpose.

Sam Schwarzkopf

Advertisements
Lars Pålsson Syll
Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

Leave a Reply

Your email address will not be published. Required fields are marked *