When I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables.Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
When I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables.
Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all sorts of observations that do not belong together. Countries, wars, racial categories, religious preferences, education levels, and other variables that change people’s coefficients are “controlled” with dummy variables that are completely inadequate to modeling their effects. The result is a long list of independent variables, a jumbled bag of nearly unrelated observations, and often a hopelessly bad specification with meaningless (but statistically significant with several asterisks!) results.A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”
The empirical and theoretical evidence is clear. Predictions and forecasts are inherently difficult to make in a socio-economic domain where genuine uncertainty and unknown unknowns often rule the roost. The real processes that underly the time series that economists use to make their predictions and forecasts do not conform with the assumptions made in the applied statistical and econometric models. Much less is a fortiori predictable than standardly — and uncritically — assumed. The forecasting models fail to a large extent because the kind of uncertainty that faces humans and societies actually makes the models strictly seen inapplicable. The future is inherently unknowable — and using statistics, econometrics, decision theory or game theory, does not in the least overcome this ontological fact. The economic future is not something that we normally can predict in advance. Better then to accept that as a rule ‘we simply do not know.’
We could, of course, just assume that the world is ergodic and hence convince ourselves that we can predict the future by looking at the past. Unfortunately, economic systems do not display that property. So we simply have to accept that all our forecasts are fragile.