Friday , May 17 2024
Home / Lars P. Syll / On the non-validity of incremental validity

On the non-validity of incremental validity

Summary:
On the non-validity of incremental validity A common goal of statistical analysis in the social sciences is to draw inferences about the relative contributions of different variables to some outcome variable. When regressing academic performance, political affiliation, or vocabulary growth on other variables, researchers often wish to determine which variables matter to the prediction and which do not—typically by considering whether each variable’s contribution remains statistically significant after statistically controlling for other predictors. When a predictor variable in a multiple regression has a coefficient that differs significantly from zero, researchers typically conclude that the variable makes a “unique” contribution to the outcome. And because measured variables are typically viewed as proxies for latent constructs of substantive interest—for example, two cognitive ability measures might be taken to index spatial versus verbal ability—it is natural to generalize the operational conclusion to the latent variable level; that is, to conclude that the latent construct measured by a given predictor variable itself has incremental validity in predicting the outcome, over and above other latent constructs that were examined. Incremental validity claims pervade the social and biomedical sciences.

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Lars Pålsson Syll writes The ‘Just One More’ Paradox (student stuff)

Lars Pålsson Syll writes Monte Carlo simulation explained (student stuff)

Lars Pålsson Syll writes The importance of ‘causal spread’

Lars Pålsson Syll writes Applied econometrics — a messy business

On the non-validity of incremental validity

On the non-validity of incremental validityA common goal of statistical analysis in the social sciences is to draw inferences about the relative contributions of different variables to some outcome variable. When regressing academic performance, political affiliation, or vocabulary growth on other variables, researchers often wish to determine which variables matter to the prediction and which do not—typically by considering whether each variable’s contribution remains statistically significant after statistically controlling for other predictors. When a predictor variable in a multiple regression has a coefficient that differs significantly from zero, researchers typically conclude that the variable makes a “unique” contribution to the outcome. And because measured variables are typically viewed as proxies for latent constructs of substantive interest—for example, two cognitive ability measures might be taken to index spatial versus verbal ability—it is natural to generalize the operational conclusion to the latent variable level; that is, to conclude that the latent construct measured by a given predictor variable itself has incremental validity in predicting the outcome, over and above other latent constructs that were examined.

Incremental validity claims pervade the social and biomedical sciences. In some fields, these claims are often explicit … More commonly, however, incremental validity claims are implicit—as when researchers claim that they have statistically “controlled” or “adjusted” for putative confounds—a practice that is exceedingly common in fields ranging from epidemiology to econometrics to behavioral neuroscience … The sheer ubiquity of such appeals might well give one the impression that such claims are unobjectionable, and if anything, represent a foundational tool for drawing meaningful scientific inferences.

Unfortunately, incremental validity claims can be deeply problematic. As we demonstrate below, even small amounts of error in measured predictor variables can result in extremely poorly calibrated Type 1 error probabilities. This basic problem has been discussed in a number of literatures—most extensively, in epidemiology and biostatistics, where concerns about incremental validity claims are often discussed under the heading of residual confounding, but also in fields ranging from psychology to education to econometrics. The common thread is that measurement unreliability and model misspecification will often have a deleterious and large effect on parameter estimates (and associated error rates) when covariates are entered into regression-based model. Consequently, under realistic assumptions, it can be shown that a large proportion of incremental validity claims in many disciplines are likely to be false …

In any given analysis, there is a simple fact of the matter as to whether or not the unique contribution of one or more variables in a regression is statistically significant when controlling for other variables; what room is there for inferential error? Trouble arises, however, when researchers behave as if statistical conclusions obtained at the level of observed measures can be automatically generalized to the level of latent constructs — a near-ubiquitous move, given that most scientists are not interested in prediction purely for prediction’s sake, and typically choose their measures precisely so as to stand in for latent constructs of interest. That is, researchers typically do not care to show that, say, school vouchers are associated with improved academic performance after controlling for a specific survey item asking about respondents’ income bracket; rather, the goal is to show that the vouchers may improve performance after accounting for the general construct of income (or, more generally, socioeconomic status).

Jacob Westfall & Tal Yarkoni

Lars Pålsson Syll
Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

Leave a Reply

Your email address will not be published. Required fields are marked *