Threats to substantive relevance of natural experiments External validity poses a challenge for most kinds of research designs, of course. In true experiments in the social sciences, the study group is not usually a random sample from some underlying population. Often, the study group consists instead of a convenience sample, that is, a group of units that have been “drawn” through some nonrandom process from an underlying population. In other studies, one cannot even readily claim that the study group has been drawn from any well-defined population. In either case, one cannot confidently project estimated causal effects to a broader population, or attach estimates of sampling error to those projections. In most true experiments, in other words, causal
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
Threats to substantive relevance of natural experiments
External validity poses a challenge for most kinds of research designs, of course. In true experiments in the social sciences, the study group is not usually a random sample from some underlying population. Often, the study group consists instead of a convenience sample, that is, a group of units that have been “drawn” through some nonrandom process from an underlying population. In other studies, one cannot even readily claim that the study group has been drawn from any well-defined population. In either case, one cannot confidently project estimated causal effects to a broader population, or attach estimates of sampling error to those projections. In most true experiments, in other words, causal inferences are drawn conditional on the study group—the particular set of units assigned to treatment and control groups. While randomization to treatment and control groups generally ensures that estimators of effects for the study group are unbiased (barring differential attrition or other threats to internal validity), whether these effects generalize to other populations is often an open question.
Yours truly’s view is that nowadays many social scientists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer questions concerning the external validity of models used in social sciences. In their view, they are more or less tests of ‘an underlying model’ that enable them to make the right selection from the ever-expanding ‘collection of potentially applicable models.’ When looked at carefully, however, there are in fact not that many convincing reasons to share this optimism.
Many ‘experimentalists’ claims that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? Population selection is almost never simple. Most social scientists — including economists — that use natural experiments, do as a rule not work with random samples taken from well-defined populations. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real-world situations/institutions/ structures that we are interested in understanding or (causally) explaining. And then the population problem is more difficult to tackle.
Achieving ‘as-if’ randomization settings is not enough. At the end of the day, what counts when we evaluate natural experiments is substantive and policy relevance — as in John Snow’s path-breaking ‘shoe-leather’ cholera study in 1855 — and not if we come up with more and more contrived instrumental-variables designs or not.
‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods is rather small.