Development economists have been using randomized controlled trials (RCTs) for the best part of two decades, and economists working on welfare policies in the US have been doing so for much longer. The years of experience have made the discussions richer and more nuanced, and both proponents and critics have learned from one another, at least to an extent. In this essay, I do not attempt to reconstruct the full range of questions that I have written about elsewhere. Instead, I focus on a few of the issues that are prominent in this volume of critical perspectives. The RCT is a useful tool, but I think that is a mistake to put method ahead of substance. I have written papers using RCTs. Like other methods of investigation, they are often useful, and, like other methods, they
Topics:
Lars Pålsson Syll considers the following as important: Economics
This could be interesting, too:
Lars Pålsson Syll writes Klas Eklunds ‘Vår ekonomi’ — lärobok med stora brister
Lars Pålsson Syll writes Ekonomisk politik och finanspolitiska ramverk
Lars Pålsson Syll writes NAIRU — a harmful fairy tale
Lars Pålsson Syll writes Isabella Weber on sellers inflation
Development economists have been using randomized controlled trials (RCTs) for the best part of two decades, and economists working on welfare policies in the US have been doing so for much longer. The years of experience have made the discussions richer and more nuanced, and both proponents and critics have learned from one another, at least to an extent. In this essay, I do not attempt to reconstruct the full range of questions that I have written about elsewhere. Instead, I focus on a few of the issues that are prominent in this volume of critical perspectives.
The RCT is a useful tool, but I think that is a mistake to put method ahead of substance. I have written papers using RCTs. Like other methods of investigation, they are often useful, and, like other methods, they have dangers and drawbacks. Methodological prejudice can only tie our hands. Context is always important, and we must adapt our methods to the problem at hand. It is not true that an RCT, when feasible, will always do better than an observational study. This should not be controversial, but my reading of the rhetoric in the literature suggests that the following statements might still make some uncomfortable, particularly the second: (a) RCTs are affected by the same problems of inference and estimation that economists have faced using other methods, and (b) no RCT can ever legitimately claim to have established causality.
My theme is that RCTs have no special status, they have no exemption from the problems of inference that econometricians have always wrestled with, and there is nothing that they, and only they, can accomplish. Just as none of the strengths of RCTs are possessed by RCTs alone, none of their weaknesses are theirs alone, and I shall take pains to emphasize those facts. There is no gold standard. There are good studies and bad studies, and that is all.
Great essay.
The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.
The problem with that simplistic view on randomization is that the claims made are both exaggerated and false:
• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!
• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only give you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ may have causal effects equal to -100 and those ‘not treated’ may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.
• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.
• Since most real-world experiments and trials build on performing a single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.
Deaton’s essay underscores the problem many ‘randomistas’ end up with when underestimating heterogeneity and interaction is not only an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.
‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. And since trials usually are not repeated, unbiasedness and balance on average over repeated trials say nothing about any one trial. ‘It works there’ is no evidence for ‘it will work here.’ Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.
RCTs have very little reach beyond giving descriptions of what has happened in the past. From the perspective of the future and for policy purposes they are as a rule of limited value since they cannot tell us what background factors were held constant when the trial intervention was being made.
RCTs usually do not provide evidence that the results are exportable to other target systems. RCTs cannot be taken for granted to give generalizable results. That something works somewhere for someone is no warranty for us to believe it to work for us here or even that it works generally.