RCTs Awesome, but Then What?

Randomized evaluations became the chic international development approach a few years back, as they re-energized the aid effectiveness debate with the promise to unbury the causal links of program interventions and development.

May 9, 2013

Kelly Steffens

The following is a guest post from Kelly Steffens, experimental task team leader at University of Texas-Austin's Center for Innovations in Peace and Development.

The International Rescue Committee and researchers from Columbia University conducted an intensive assessment of Tuungane, a community driven reconstruction (CDR) program in the Democratic Republic of Congo (DRC). Tuungane organizes elections of village committees, as well as provides training in leadership, good governance, and social inclusion with the goal that local governments will be more accountable, efficient, transparent, and participatory. By nearly all measures, the program is massive:

- Targeted beneficiary population: 1,780,000 people.

- Budget for phase one: USD $46,309,000.

- Geographic Distribution: 1000s of Kilometers.

Evaluators used an impressively designed, rigorous and robust randomized intervention to assess the impacts of the program. Of the 34 outcome measures evaluated, only two were found to be statistically significant in the expected direction (willingness of the population to complain and to trust in others). Neither of the outcomes are significant at the 99% confidence level. And wonderfully, the evaluators pre-committed to an analysis plan and have stuck to it in their reporting.

By most standards, these results would be pretty damaging to the community driven development (CDD) agenda. Unsurprisingly, and correctly so, it has led to calls for more randomized evaluations on the topic. This can be a good thing as replication of RCTs is crucial.

Currently, the World Bank still supports 400 community driven development (a sister to CDRs) projects in 94 countries, valued at almost $30 billion. Thus more evidence should arrive soon. But how do we separate the push for more replication to identify the actual impact of CDD from efforts to continue to confirm previous biases?

Randomized evaluations became the chicinternational development approach a few years back, as they re-energized the aid effectiveness debate with the promise to unbury the causal links of program interventions and development. According to Brigham, Findley, Matthias, Petrey and Nielson, “By assigning interventions to treatment and control groups, researchers can learn the causal effects of the projects and, by replication, accumulate knowledge of effective development practice in which we can place high confidence.” Randomized evaluations tell us what works…and what doesn’t. To twist Dani Rodrik’s terms slightly, “We shall experiment, but how shall we learn” from the outcomes of those experiments?

In a recent paper, Brigham, Findley, Matthias, Petrey and Nielson begin to tackle this critical question. The five authors contacted 1,419 micro-finance institutions (MFI), with an offer to partner on evaluations of their programs. With the offer, a randomized set of MFIs received positive information on the effectiveness of MFIs; a second set received negative information on the effectiveness of MFIs; while a third group received no additional information. Those that received the negative treatment are significantly and substantially less likely to respond to their offer, about 10% respond to the positive message, 5% respond to the negative message. The analysis thus concludes that there is significant confirmation bias among microfinance institutions. As attractive as randomization is, what good does it do if no one listens to the results they didn’t want to hear?

NGOs, donors and individuals can find excuses to ignore insignificant or negative results in any evaluation. Perhaps the results are not externally valid? Just because the program had no effect in the DRC doesn’t mean it won’t work in Kenya… They have very different background conditions. Otherwise, maybe the RCT wasn’t internally valid? The researchers measured program effects on child mortality, when they should have been looking at child malnutrition. If you’re looking for excuses to do what you already wanted to do, there will be options with RCTs.

Despite how negative I sound on the topic, I actually like to consider myself a budding randomista fan girl. I think a lot of the randomized evaluations are brilliant, and offer incredible insights into the development world. RCTs are not the problem. The uptake is. As the brilliant Albert Einstein said, “Insanity: doing the same thing over and over again and expecting different results.” If evaluations show a type of intervention isn’t working, we should do something else.

Kelly Steffen is a dual masters' candidate in Global Policy Studies and Business Administration at the University of Texas at Austin. She is the experimental task team leader for UT's Center for Innovations on Peace and Development, a partner of the AidData Center for Development Policy.

No items found.