Thursday , August 11 2022
Home / Lars P. Syll / ‘Overcontrolling’ in statistical studies

‘Overcontrolling’ in statistical studies

Summary:
‘Overcontrolling’ in statistical studies You see it all the time in studies. “We controlled for…” And then the list starts … The more things you can control for, the stronger your study is — or, at least, the stronger your study seems. Controls give the feeling of specificity, of precision. But sometimes, you can control for too much. Sometimes you end up controlling for the thing you’re trying to measure … An example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure … Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Lars Pålsson Syll writes De Finetti on the dangers of mathematization

Lars Pålsson Syll writes On statistics and causality

Lars Pålsson Syll writes The insignificance of significance

Lars Pålsson Syll writes Why I am not a Bayesian

‘Overcontrolling’ in statistical studies

You see it all the time in studies. “We controlled for…” And then the list starts … The more things you can control for, the stronger your study is — or, at least, the stronger your study seems. Controls give the feeling of specificity, of precision. But sometimes, you can control for too much. Sometimes you end up controlling for the thing you’re trying to measure …

‘Overcontrolling’ in statistical studiesAn example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure …

Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the gender wage gap vanishes. As Yglesias wrote, it’s “silly to act like this is just some crazy coincidence. Women work shorter hours because as a society we hold women to a higher standard of housekeeping, and because they tend to be assigned the bulk of childcare responsibilities.”

Controlling for hours worked, in other words, is at least partly controlling for how gender works in our society. It’s controlling for the thing that you’re trying to isolate.

Ezra Klein

Trying to reduce the risk of having established only ‘spurious relations’ when dealing with observational data, statisticians and econometricians standardly add control variables. The hope is that one thereby will be able to make more reliable causal inferences. But — as Keynes showed already back in the 1930s when criticizing statistical-econometric applications of regression analysis — if you do not manage to get hold of all potential confounding factors, the model risks producing estimates of the variable of interest that are even worse than models without any control variables at all. Conclusion: think twice before you simply include ‘control variables’ in your models!

‘Overcontrolling’ in statistical studiesWhen I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables …

A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”

Christopher H. Achen

Kitchen sink econometric models are often the result of researchers trying to control for confounding. But what they usually haven’t understood is that the confounder problem requires a causal solution and not statistical ‘control.’ Controlling for everything opens up the risk that we control for ‘collider’ variables and thereby create ‘back-door paths’ which gives us confounding that wasn’t there, to begin with.

Lars Pålsson Syll
Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

Leave a Reply

Your email address will not be published. Required fields are marked *