We who like to imagine ourselves responsible for the public’s knowledge of society despise description and indeed despise the methods that are generally used for quantitative description. Our social indicators are simply disaggregated variables, ready for input to causal analysis. The notions of complex combinatoric description, of typologies based on multiple variables — these fill the average sociologist with disgust. Our disgust is disingenuous, for ease of computing has made regression itself a descriptive method. When dozens of regressions can be run in an afternoon and when the average regression-based journal article reports perhaps 5 to 10 percent of the runs actually done, we should stop kidding ourselves about science and hypothesis testing. And taken as a
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes The history of econometrics
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
We who like to imagine ourselves responsible for the public’s knowledge of society despise description and indeed despise the methods that are generally used for quantitative description. Our social indicators are simply disaggregated variables, ready for input to causal analysis. The notions of complex combinatoric description, of typologies based on multiple variables — these fill the average sociologist with disgust.
Our disgust is disingenuous, for ease of computing has made regression itself a descriptive method. When dozens of regressions can be run in an afternoon and when the average regression-based journal article reports perhaps 5 to 10 percent of the runs actually done, we should stop kidding ourselves about science and hypothesis testing. And taken as a descriptive technique, regression is quite poor. Description aims to reduce a welter of data to something manageable. But regression reduces the dimensionality of the data space only by one. Worse still, that lost dimension usually retains most of its variation. So we have not even understood why that one thing happens. We have understood the effects of the independent variables on that one dependent dimension, and in an evaluation context — when we are trying to make decisions about whether to use fertilizer on the field or dopamine in the brain — regression is without question the method of choice.
But as a general method for understanding why society happens the way it does, much less as a strategy for simple description, causally interpreted regression is pretty much a waste of time. Scaling and clustering may throw away the vast majority of dimensionality, but by doing so they often produce compelling and powerful simplifications of complex data.