Reading an applied econometrics paper could leave you with the impression that the economist (or any social science researcher) first formulated a theory, then built an empirical test based on the theory, then tested the theory. But in my experience what generally happens is more like the opposite: with some loose ideas in mind, the econometrician runs a lot of different regressions until they get something that looks plausible, then tries to fit it into a theory (existing or new) … Statistical theory itself tells us that if you do this for long enough, you will eventually find something plausible by pure chance! This is bad news because as tempting as that final, pristine looking causal effect is, readers have no way of knowing how it was arrived at. There are several ways I’ve seen to guard against this: (1) Use a multitude of empirical specifications to test the robustness of the causal links, and pick the one with the best predictive power … (2) Have researchers submit their paper for peer review before they carry out the empirical work, detailing the theory they want to test, why it matters and how they’re going to do it. Reasons for inevitable deviations from the research plan should be explained clearly in an appendix by the authors and (re-)approved by referees. (3) Insist that the paper be replicated.
Topics:
Lars Pålsson Syll considers the following as important: Economics, Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes Daniel Waldenströms rappakalja om ojämlikheten
Peter Radford writes AJR, Nobel, and prompt engineering
Lars Pålsson Syll writes MMT explained
Lars Pålsson Syll writes Statens finanser funkar inte som du tror
Reading an applied econometrics paper could leave you with the impression that the economist (or any social science researcher) first formulated a theory, then built an empirical test based on the theory, then tested the theory. But in my experience what generally happens is more like the opposite: with some loose ideas in mind, the econometrician runs a lot of different regressions until they get something that looks plausible, then tries to fit it into a theory (existing or new) … Statistical theory itself tells us that if you do this for long enough, you will eventually find something plausible by pure chance!
This is bad news because as tempting as that final, pristine looking causal effect is, readers have no way of knowing how it was arrived at. There are several ways I’ve seen to guard against this:
(1) Use a multitude of empirical specifications to test the robustness of the causal links, and pick the one with the best predictive power …
(2) Have researchers submit their paper for peer review before they carry out the empirical work, detailing the theory they want to test, why it matters and how they’re going to do it. Reasons for inevitable deviations from the research plan should be explained clearly in an appendix by the authors and (re-)approved by referees.
(3) Insist that the paper be replicated. Firstly, by having the authors submit their data and code and seeing if referees can replicate it (think this is a low bar? Most empirical research in ‘top’ economics journals can’t even manage it). Secondly — in the truer sense of replication — wait until someone else, with another dataset or method, gets the same findings in at least a qualitative sense. The latter might be too much to ask of researchers for each paper, but it is a good thing to have in mind as a reader before you are convinced by a finding.
All three of these should, in my opinion, be a prerequisite for research that uses econometrics …
Naturally, this would result in a lot more null findings and probably a lot less research. Perhaps it would also result in fewer attempts at papers which attempt to tell the entire story: that is, which go all the way from building a new model to finding (surprise!) that even the most rigorous empirical methods support it.
Good suggestions, but unfortunately there are many more deep problems with econometrics that have to be ‘solved.’
In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes causal knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture. The assumption of imaginary ‘superpopulations’ is one of the many dubious assumptions used in modern econometrics.
Misapplication of inferential statistics to non-inferential situations is a non-starter for doing proper science. And when choosing which models to use in our analyses, we cannot get around the fact that the evaluation of our hypotheses, explanations, and predictions cannot be made without reference to a specific statistical model or framework. The probabilistic-statistical inferences we make from our samples decisively depends on what population we choose to refer to. The reference class problem shows that there usually are many such populations to choose from, and that the one we choose decides which probabilities we come up with and a fortiori which predictions we make. Not consciously contemplating the relativity effects this choice of ‘nomological-statistical machines’ have, is probably one of the reasons econometricians have a false sense of the amount of uncertainty that really afflicts their models.
As economists and econometricians we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting Haavelmo’s domain of probability theory and sample space of infinite populations – just as Fisher’s ‘hypothetical infinite population,’ von Mises’s ‘collective’ or Gibbs’s ‘ensemble’ – also implies that judgments are made on the basis of observations that are actually never made! Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.
Economists — and econometricians — have (uncritically and often without arguments) come to simply assume that one can apply probability distributions from statistical theory on their own area of research. However, there are fundamental problems arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels.
Of course one could arguably treat our observational or experimental data as random samples from real populations. But probabilistic econometrics does not content itself with that kind of populations. Instead it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of populations. But this is actually nothing but hand-waving! Doing econometrics it’s always wise to remember C. S. Peirce’s remark that universes are not as common as peanuts …