Instrumental Variables — The Good and the Bad .[embedded content] Making appropriate extrapolations from (ideal, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here.” The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods used when analyzing ‘natural experiments’ is often despairingly small. Since the core assumptions on which onstrumental variables (IV) analysis builds are NEVER directly testable, those of us who choose to use instrumental variables to find out about causality ALWAYS have to defend and argue for the validity of the assumptions the causal inferences build
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
Instrumental Variables — The Good and the Bad
.
Making appropriate extrapolations from (ideal, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here.” The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods used when analyzing ‘natural experiments’ is often despairingly small. Since the core assumptions on which onstrumental variables (IV) analysis builds are NEVER directly testable, those of us who choose to use instrumental variables to find out about causality ALWAYS have to defend and argue for the validity of the assumptions the causal inferences build on. Especially when dealing with natural experiments, we should be very cautious when being presented with causal conclusions without convincing arguments about the veracity of the assumptions made. If you are out to make causal inferences you have to rely on a trustworthy theory of the data-generating process. The empirical results causal analysis supplies us with are only as good as the assumptions we make about the data-generating process.
It also needs to be pointed out that many economists, when they use instrumental variables analysis, make the mistake of thinking that swapping an assumption of residuals being uncorrelated with the independent variables with the assumption that the same residuals are uncorrelated with an instrument doesn’t solve the endogeneity problem or improve the causal analysis.
The present interest in randomization, instrumental variables estimation, and natural experiments, is an expression of a new trend in economics, where there is a growing interest in (ideal, quasi, natural) experiments and — not least — how to design them to possibly provide answers to questions about causality and policy effects. Economic research on e.g. discrimination nowadays often emphasizes the importance of a randomization design, for example when trying to determine to what extent discrimination can be causally attributed to differences in preferences or information, using so-called correspondence tests and field experiments.
A common starting point is the ‘counterfactual approach’ developed mainly by Neyman and Rubin, which is often presented and discussed based on examples of research designs like randomized control studies, natural experiments, difference in difference, matching, regression discontinuity, etc.
Mainstream economists generally view this development of the economics toolbox positively. Since yours truly is not entirely positive about the randomization approach, I will share with you some of my criticisms.
A notable limitation of counterfactual randomization designs is that they only give us answers on how ‘treatment groups’ differ on average from ‘control groups.’ Let me give an example to illustrate how limiting this fact can be:
Among school debaters and politicians in Sweden, it is claimed that so-called ‘independent schools’ (charter schools) are better than municipal schools. They are said to lead to better results. To find out if this is really the case, a number of students are randomly selected to take a test. The result could be: Test result = 20 + 5T, where T=1 if the student attends an independent school and T=0 if the student attends a municipal school. This would confirm the assumption that independent school students have an average of 5 points higher results than students in municipal schools. Now, politicians (hopefully) are aware that this statistical result cannot be interpreted in causal terms because independent school students typically do not have the same background (socio-economic, educational, cultural, etc.) as those who attend municipal schools (the relationship between school type and result is confounded by selection bias). To obtain a better measure of the causal effects of school type, politicians suggest that 1000 students be admitted to an independent school through a lottery — a classic example of a randomization design in natural experiments. The chance of winning is 10%, so 100 students are given this opportunity. Of these, 20 accept the offer to attend an independent school. Of the 900 lottery participants who do not ‘win,’ 100 choose to attend an independent school. The lottery is often perceived by school researchers as an ‘instrumental variable,’ and when the analysis is carried out, the result is: Test result = 20 + 2T. This is standardly interpreted as having obtained a causal measure of how much better students would, on average, perform on the test if they chose to attend independent schools instead of municipal schools. But is it true? No! If not all school students have exactly the same test results (which is a rather far-fetched ‘homogeneity assumption’), the specified average causal effect only applies to the students who choose to attend an independent school if they ‘win’ the lottery, but who would not otherwise choose to attend an independent school (in statistical jargon, we call these ‘compliers’). It is difficult to see why this group of students would be particularly interesting in this example, given that the average causal effect estimated using the instrumental variable says nothing at all about the effect on the majority (the 100 out of 120 who choose to attend an independent school without ‘winning’ in the lottery) of those who choose to attend an independent school.
Conclusion: Researchers must be much more careful in interpreting ‘average estimates’ as causal. Reality exhibits a high degree of heterogeneity, and ‘average parameters’ often tell us very little!
To randomize ideally means that we achieve orthogonality (independence) in our models. But it does not mean that in real experiments when we randomize, we achieve this ideal. The ‘balance’ that randomization should ideally result in cannot be taken for granted when the ideal is translated into reality. Here, one must argue and verify that the ‘assignment mechanism’ is truly stochastic and that ‘balance’ has indeed been achieved!
Even if we accept the limitation of only being able to say something about average treatment effects there is another theoretical problem. An ideal randomized experiment assumes that a number of individuals are first chosen from a randomly selected population and then randomly assigned to a treatment group or a control group. Given that both selection and assignment are successfully carried out randomly, it can be shown that the expected outcome difference between the two groups is the average causal effect in the population. The snag is that the experiments conducted almost never involve participants selected from a random population! In most cases, experiments are started because there is a problem of some kind in a given population (e.g., schoolchildren or job seekers in country X) that one wants to address. An ideal randomized experiment assumes that both selection and assignment are randomized — this means that virtually none of the empirical results that randomization advocates so eagerly tout hold up in a strict mathematical-statistical sense. The fact that only assignment is talked about when it comes to ‘as if’ randomization in natural experiments is hardly a coincidence. Moreover, when it comes to ‘as if’ randomization in natural experiments, the sad but inevitable fact is that there can always be a dependency between the variables being studied and unobservable factors in the error term, which can never be tested!
Another significant and major problem is that researchers who use these randomization-based research strategies often set up problem formulations that are not at all the ones we really want answers to, in order to achieve ‘exact’ and ‘precise’ results. Design becomes the main thing, and as long as one can get more or less clever experiments in place, they believe they can draw far-reaching conclusions about both causality and the ability to generalize experimental outcomes to larger populations. Unfortunately, this often means that this type of research has a negative bias away from interesting and important problems towards prioritizing method selection. Design and research planning are important, but the credibility of research ultimately lies in being able to provide answers to relevant questions that both citizens and researchers want answers to.
Believing there is only one really good evidence-based method on the market — and that randomization is the only way to achieve scientific validity — blinds people to searching for and using other methods that in many contexts are better. Insisting on using only one tool often means using the wrong tool.