Tuesday , March 31 2020
Home / Lars P. Syll / On randomization and regression (wonkish)

# On randomization and regression (wonkish)

Summary:
On randomization and regression (wonkish) Randomization does not justify the regression model, so that bias can be expected, and the usual formulas do not give the right variances. Moreover, regression need not improve precision … What is the source of the bias when regression models are applied to experimental data? In brief, the regression model assumes linear additive effects. Given the assignments, the response is taken to be a linear combina- tion of treatment dummies and covariates, with an additive random error; coefficients are assumed to be constant across subjects. The Neyman [potential outcome] model makes no assumptions about linearity and additivity. If we write the expected response given the assignments as a linear combination of treatment

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Lars Pålsson Syll writes Economic growth and the size of the ‘private sector’

Lars Pålsson Syll writes Econometric modelling as junk science

Lars Pålsson Syll writes Econometrics — the signal-to-noise problem

Lars Pålsson Syll writes Econometric testing

## On randomization and regression (wonkish)

Randomization does not justify the regression model, so that bias can be expected, and the usual formulas do not give the right variances. Moreover, regression need not improve precision …

What is the source of the bias when regression models are applied to experimental data? In brief, the regression model assumes linear additive effects. Given the assignments, the response is taken to be a linear combina- tion of treatment dummies and covariates, with an additive random error; coefficients are assumed to be constant across subjects. The Neyman [potential outcome] model makes no assumptions about linearity and additivity. If we write the expected response given the assignments as a linear combination of treatment dummies, coefficients will vary across subjects. That is the source of the bias …

To put this more starkly, in the Neyman model, inferences are based on the random assignment to the several treatments. Indeed, the only stochastic element in the model is the randomization. With regression, inferences are made conditional on the assignments. The stochastic element is the error term, and the inferences depend on assumptions about that error term. Those assumptions are not justified by randomization. The breakdown in assumptions explains why regression comes up short when calibrated against the Neyman model …

Variances in the Neyman model are (necessarily) computed across the assignments, for it is the assignments that are the random elements in the model. With regression, variances are computed conditionally on the assignments, from an error term assumed to be IID across subjects, and independent of the assignment variables as well as the covariates. These assumptions do not follow from the randomization, explaining why the usual formulas break down.

David Freedman

Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.