On randomization and regression (wonkish) Randomization does not justify the regression model, so that bias can be expected, and the usual formulas do not give the right variances. Moreover, regression need not improve precision … What is the source of the bias when regression models are applied to experimental data? In brief, the regression model assumes linear additive effects. Given the assignments, the response is taken to be a linear combina- tion of treatment dummies and covariates, with an additive random error; coefficients are assumed to be constant across subjects. The Neyman [potential outcome] model makes no assumptions about linearity and additivity. If we write the expected response given the assignments as a linear combination of treatment
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
On randomization and regression (wonkish)
Randomization does not justify the regression model, so that bias can be expected, and the usual formulas do not give the right variances. Moreover, regression need not improve precision …
What is the source of the bias when regression models are applied to experimental data? In brief, the regression model assumes linear additive effects. Given the assignments, the response is taken to be a linear combina- tion of treatment dummies and covariates, with an additive random error; coefficients are assumed to be constant across subjects. The Neyman [potential outcome] model makes no assumptions about linearity and additivity. If we write the expected response given the assignments as a linear combination of treatment dummies, coefficients will vary across subjects. That is the source of the bias …
To put this more starkly, in the Neyman model, inferences are based on the random assignment to the several treatments. Indeed, the only stochastic element in the model is the randomization. With regression, inferences are made conditional on the assignments. The stochastic element is the error term, and the inferences depend on assumptions about that error term. Those assumptions are not justified by randomization. The breakdown in assumptions explains why regression comes up short when calibrated against the Neyman model …
Variances in the Neyman model are (necessarily) computed across the assignments, for it is the assignments that are the random elements in the model. With regression, variances are computed conditionally on the assignments, from an error term assumed to be IID across subjects, and independent of the assignment variables as well as the covariates. These assumptions do not follow from the randomization, explaining why the usual formulas break down.