Friday , June 28 2024
Home / Lars P. Syll / Variable selection — not about having a ‘good fit’

Variable selection — not about having a ‘good fit’

Summary:
Variable selection — not about having a ‘good fit’ Which independent variables should be included in the equation? The goal is a “good fit” … How can a good fit be recognized? A popular measure for the satisfactoriness of a regression is the coefficient of determination, R2. If this number is large, it is said, the regression gives a good fit … Nothing about R2 supports these claims. This statistic is best regarded as characterizing the geometric shape of the regression points and not much more. The central difficulty with R2 for social scientists is that the independent variables are not subject to experimental manipulation. In some samples, they vary widely, producing large variance; in other cases, the observations are more tightly grouped and there is Little dispersion. The variances are a function of the sample, not of the underlying relationship. Hence they cannot have any real connection to the “strength” of the relationship as social scientists ordinarily use the term, i. e., as a measure of how much effect a given change in independent variable has on the dependent variable … Thus “maximizing R2” cannot be a reasonable procedure for arriving at a strong relationship. It neither measures causal power nor is comparable across samples … “Explaining variance” is not what social science is about.

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

New Economics Foundation writes The foundations of the housing crisis

Stavros Mavroudeas writes Σχόλια σε άρθρο για το χρηματιστήριο τροφίμων και τις αυξήσεις των τιμών τους – ΒΗΜΑ 23/6/2024

Robert Waldmann writes What Chinese Invasion Fleet ?

NewDealdemocrat writes FHFA and Case Shiller repeat sales indexes show YoY price growth has peaked; slow deceleration in shelter CPI should continue

Variable selection — not about having a ‘good fit’

Variable selection — not about having a ‘good fit’Which independent variables should be included in the equation? The goal is a “good fit” … How can a good fit be recognized? A popular measure for the satisfactoriness of a regression is the coefficient of determination, R2. If this number is large, it is said, the regression gives a good fit …

Nothing about R2 supports these claims. This statistic is best regarded as characterizing the geometric shape of the regression points and not much more.

The central difficulty with R2 for social scientists is that the independent variables are not subject to experimental manipulation. In some samples, they vary widely, producing large variance; in other cases, the observations are more tightly grouped and there is Little dispersion. The variances are a function of the sample, not of the underlying relationship. Hence they cannot have any real connection to the “strength” of the relationship as social scientists ordinarily use the term, i. e., as a measure of how much effect a given change in independent variable has on the dependent variable …

Thus “maximizing R2” cannot be a reasonable procedure for arriving at a strong relationship. It neither measures causal power nor is comparable across samples … “Explaining variance” is not what social science is about.

Christopher Achen

Lars Pålsson Syll
Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

Leave a Reply

Your email address will not be published. Required fields are marked *