Monday , April 28 2025
Home / Lars P. Syll / Should we ‘control for’ everything when running regressions? No way!

Should we ‘control for’ everything when running regressions? No way!

Summary:
Should we ‘control for’ everything when running regressions? No way! When I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables. Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all sorts of observations that do not belong together. Countries, wars, racial categories, religious preferences, education levels, and other variables that change people’s coefficients are “controlled” with dummy variables that are completely inadequate to modeling their effects. The result is a long list of independent variables, a jumbled bag of nearly unrelated observations, and often a hopelessly bad specification with meaningless (but statistically significant with several asterisks!) results.

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Robert Vienneau writes Austrian Capital Theory And Triple-Switching In The Corn-Tractor Model

Mike Norman writes The Accursed Tariffs — NeilW

Mike Norman writes IRS has agreed to share migrants’ tax information with ICE

Mike Norman writes Trump’s “Liberation Day”: Another PR Gag, or Global Reorientation Turning Point? — Simplicius

Should we ‘control for’ everything when running regressions? No way!

When I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables.

Should we ‘control for’ everything when running regressions? No way!Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all sorts of observations that do not belong together. Countries, wars, racial categories, religious preferences, education levels, and other variables that change people’s coefficients are “controlled” with dummy variables that are completely inadequate to modeling their effects. The result is a long list of independent variables, a jumbled bag of nearly unrelated observations, and often a hopelessly bad specification with meaningless (but statistically significant with several asterisks!) results.

A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”

Christopher H. Achen

Lars Pålsson Syll
Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

Leave a Reply

Your email address will not be published. Required fields are marked *