The dangers of idealizing randomization In his history of experimental social science — Randomistas: How radical researchers are changing our world — Andrew Leigh gives an introduction to the RCT (randomized controlled trial) method for conducting experiments in medicine, psychology, development economics, and policy evaluation. Although it mentions there are critiques that can be waged against it, the author does not let that shadow his overwhelmingly enthusiastic view on RCT. Among mainstream economists, this rather uncritical attitude towards RCTs has become standard. Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
The dangers of idealizing randomization
In his history of experimental social science — Randomistas: How radical researchers are changing our world — Andrew Leigh gives an introduction to the RCT (randomized controlled trial) method for conducting experiments in medicine, psychology, development economics, and policy evaluation. Although it mentions there are critiques that can be waged against it, the author does not let that shadow his overwhelmingly enthusiastic view on RCT.
Among mainstream economists, this rather uncritical attitude towards RCTs has become standard. Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer questions concerning the external validity of economic models. In their view, they are more or less tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’
When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’empirical turn’ in economics.
The present interest in randomization, instrumental variables estimation, and natural experiments, is an expression of a new trend in economics, where there is a growing interest in (ideal, quasi, natural) experiments and — not least — how to design them to possibly provide answers to questions about causality and policy effects. Economic research on e. g. discrimination nowadays often emphasizes the importance of a randomization design, for example when trying to determine to what extent discrimination can be causally attributed to differences in preferences or information, using so-called correspondence tests and field experiments.
A common starting point is the ‘counterfactual approach’ developed mainly by Neyman and Rubin, which is often presented and discussed based on examples of research designs like randomized control studies, natural experiments, difference in difference, matching, regression discontinuity, etc.
Mainstream economists generally view this development of the economics toolbox positively. Since yours truly — like, for example, Nancy Cartwright and Angus Deaton — is not entirely positive about the randomization approach, I will share with you some of my criticisms.
A notable limitation of counterfactual randomization designs is that they only give us answers on how ‘treatment groups’ differ on average from ‘control groups.’ Let me give just one example to illustrate how limiting this fact can be:
Among school debaters and politicians in Sweden, it is claimed that so-called ‘independent schools’ (charter schools) are better than municipal schools. They are said to lead to better results. To find out if this is really the case, a number of students are randomly selected to take a test. The result could be: Test result = 20 + 5T, where T=1 if the student attends an independent school and T=0 if the student attends a municipal school. This would confirm the assumption that independent school students have an average of 5 points higher results than students in municipal schools. Now, politicians (hopefully) are aware that this statistical result cannot be interpreted in causal terms because independent school students typically do not have the same background (socio-economic, educational, cultural, etc.) as those who attend municipal schools (the relationship between school type and result is confounded by selection bias). To obtain a better measure of the causal effects of school type, politicians suggest that 1000 students be admitted to an independent school through a lottery — a classic example of a randomization design in natural experiments. The chance of winning is 10%, so 100 students are given this opportunity. Of these, 20 accept the offer to attend an independent school. Of the 900 lottery participants who do not ‘win,’ 100 choose to attend an independent school. The lottery is often perceived by school researchers as an ‘instrumental variable,’ and when the analysis is carried out, the result is: Test result = 20 + 2T. This is standardly interpreted as having obtained a causal measure of how much better students would, on average, perform on the test if they chose to attend independent schools instead of municipal schools. But is it true? No! If not all school students have exactly the same test results (which is a rather far-fetched ‘homogeneity assumption’), the specified average causal effect only applies to the students who choose to attend an independent school if they ‘win’ the lottery, but who would not otherwise choose to attend an independent school (in statistical jargon, we call these ‘compliers’). It is difficult to see why this group of students would be particularly interesting in this example, given that the average causal effect estimated using the instrumental variable says nothing at all about the effect on the majority (the 100 out of 120 who choose to attend an independent school without ‘winning’ in the lottery) of those who choose to attend an independent school.
Conclusion: Researchers must be much more careful in interpreting ‘average estimates’ as causal. Reality exhibits a high degree of heterogeneity, and ‘average parameters’ often tell us very little!
To randomize ideally means that we achieve orthogonality (independence) in our models. But it does not mean that in real experiments when we randomize, we achieve this ideal. The ‘balance’ that randomization should ideally result in cannot be taken for granted when the ideal is translated into reality. Here, one must argue and verify that the ‘assignment mechanism’ is truly stochastic and that ‘balance’ has indeed been achieved!
Even if we accept the limitation of only being able to say something about average treatment effects there is another theoretical problem. An ideal randomized experiment assumes that a number of individuals are first chosen from a randomly selected population and then randomly assigned to a treatment group or a control group. Given that both selection and assignment are successfully carried out randomly, it can be shown that the expected outcome difference between the two groups is the average causal effect in the population. The snag is that the experiments conducted almost never involve participants selected from a random population! In most cases, experiments are started because there is a problem of some kind in a given population (e.g., schoolchildren or job seekers in country X) that one wants to address. An ideal randomized experiment assumes that both selection and assignment are randomized — this means that virtually none of the empirical results that randomization advocates so eagerly tout hold up in a strict mathematical-statistical sense. The fact that only assignment is talked about when it comes to ‘as if’ randomization in natural experiments is hardly a coincidence. Moreover, when it comes to ‘as if’ randomization in natural experiments, the sad but inevitable fact is that there can always be a dependency between the variables being studied and unobservable factors in the error term, which can never be tested!
Another significant and major problem is that researchers who use these randomization-based research strategies often set up problem formulations that are not at all the ones we really want answers to, in order to achieve ‘exact’ and ‘precise’ results. Design becomes the main thing, and as long as one can get more or less clever experiments in place, they believe they can draw far-reaching conclusions about both causality and the ability to generalize experimental outcomes to larger populations. Unfortunately, this often means that this type of research has a negative bias away from interesting and important problems towards prioritizing method selection. Design and research planning are important, but the credibility of research ultimately lies in being able to provide answers to relevant questions that both citizens and researchers want answers to.
Believing there is only one really good evidence-based method on the market — and that randomization is the only way to achieve scientific validity — blinds people to searching for and using other methods that in many contexts are better. Insisting on using only one tool often means using the wrong tool.