Tuesday , March 19 2024
Home / Lars P. Syll / RCTs and the limits of evidence-based policies

RCTs and the limits of evidence-based policies

Summary:
RCTs and the limits of evidence-based policies There is something paradoxical about Michael Gove’s recent speech calling for government to be “rigorous and fearless in its evaluation of policy and projects.” It’s that his praise for evidence-based policy has come in a year when we’ve seen that policy should sometimes not be based on rigorous evidence … It’s sometimes hard to extrapolate the results of the RCTs lauded by Gove. They “prove”, for example, that parachutes don’t save lives. As Cartwright and Deaton say: “Demonstrating that a treatment works in one situation is exceedingly weak evidence that it will work in the same way elsewhere; this is the ‘transportation’ problem.” And where they do yield results, these can be hard to interpret. The

Topics:
Lars Pålsson Syll considers the following as important:

This could be interesting, too:

Lars Pålsson Syll writes Angus Deaton rethinking economics

Lars Pålsson Syll writes Ideology and the politics of economic method

Lars Pålsson Syll writes ‘New Keynesian’ unemployment — a paid vacation essentially!

Lars Pålsson Syll writes Gatekeepers of the economics profession

RCTs and the limits of evidence-based policies

There is something paradoxical about Michael Gove’s recent speech calling for government to be “rigorous and fearless in its evaluation of policy and projects.” It’s that his praise for evidence-based policy has come in a year when we’ve seen that policy should sometimes not be based on rigorous evidence …

RCTs and the limits of evidence-based policiesIt’s sometimes hard to extrapolate the results of the RCTs lauded by Gove. They “prove”, for example, that parachutes don’t save lives. As Cartwright and Deaton say:

“Demonstrating that a treatment works in one situation is exceedingly weak evidence that it will work in the same way elsewhere; this is the ‘transportation’ problem.”

And where they do yield results, these can be hard to interpret. The average treatment effect hides important heterogeneity of effects on individuals, for example.

There’s also the problem that past evidence is no guide to the future … There are two very important categories of cases where past evidence doesn’t help us predict the future.

One is the problem of radical uncertainty: we can never be 100% confident that past statistical relationships will continue to hold. The other is that of reflexivity. Beliefs can change the world. For example, McLean and Pontiff and Cotter & McGeever have shown that when strong evidence emerges of stock market anomalies, these get competed away. In economic policy, the counterpart to this is Goodhart’s law: “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Chris Dillow

As Chris notes in his post, evidence-based theories and policies are highly valued nowadays. Randomization is supposed to control for bias from unknown confounders. The received opinion is that evidence based on randomized experiments, therefore, is the best.

More and more economists have also lately come to advocate randomization as the principal method for ensuring being able to make valid causal inferences.

Yours truly would however rather argue that randomization, just as econometrics, promises more than it can deliver, basically because it requires assumptions that in practice are not possible to maintain. Just as econometrics, randomization is basically a deductive method. Given the assumptions (such as manipulability, transitivity, separability, additivity, linearity, etc.) these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. And although randomization may contribute to controlling for confounding, it does not guarantee it, since genuine randomness presupposes infinite experimentation and we know all real experimentation is finite. And even if randomization may help to establish average causal effects, it says nothing of individual effects unless homogeneity is added to the list of assumptions. Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in ‘closed’ models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that simplistic view on randomization is that the claims made are both exaggerated and false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only give you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ may have causal effects equal to -100, and those ‘not treated’ may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing one single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

Randomization is not a panacea. It is not the best method for all questions and circumstances. Proponents of randomization make claims about its ability to deliver causal knowledge that is simply wrong. There are good reasons to be skeptical of the now popular — and ill-informed — view that randomization is the only valid and best method on the market. It is not.

Lars Pålsson Syll
Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

Leave a Reply

Your email address will not be published. Required fields are marked *