A Rejoinder to Rubbery Numbers on Chinese Aid

In this post, we respond to Deborah Brautigam’s review of our Chinese development finance data collection project. Our initiative is premised on the idea that we should open our data and methods to criticism in order to improve them. To this end, we provided all users with a methodology that describes all our procedures and methods. In combination with the publicly released dataset, the methodology allows any user to identify the source of any error and offer a specific and actionable corrective directly to AidData staff via the public portal. We thank Brautigam for being one of the first to provide feedback, and hope that in the coming days we will receive suggestions about specific records in the database that need to be re-examined. We will certainly integrate corrected records and credit the users who provide such valuable feedback. Project #28153 is a case in point. Brautigam identified an error and we corrected the record.
At the same time, several of Brautigam’s critiques of our database merit closer scrutiny. We offer some initial responses to her comments below that we hope will become part of a broader multi-stakeholder dialogue.
First, Brautigam has several concerns about our research methods. She states that ”reliance on media reports for data collection on Chinese finance is a very dicey methodology.” But in the same blog post she marshals evidence from media reports to challenge individual records in our database, which suggests that media reports, though imperfect, are often the best means available to track Chinese development finance. We are not wedded to the exclusive use of media reports, and those who carefully read our methodology can see for themselves how we vet and refine the data through triangulation of multiple types and sources of information.
Second, Brautigam asserts that “[t]he main problem is that the teams that have been collecting the data and their supervisors simply don't know enough about China in Africa, or how to check media reports, track down the realities of a project, and dig into the story to find out what really happened.You can start with media reports, but it is highly problematic to stop there.” We agree that one needs to triangulate many sources of information for each project over the course of its lifecycle. That is in fact what our methodology requires. It augments data drawn from journalistic accounts with case study scholarship, information and insights from experts (including the project-specific information made available through Brautigam’s blog), project-level data collected by other researchers, and official government data. The “digging” that Brautigam calls for is in fact a central part of our approach.
Third, Brautigam seems to think that it’s a bad idea to publish this dataset and expose our methods and sources to public scrutiny. She says “[d]ata-driven researchers won't wait around to have someone clean the data. They'll start using it and publishing with it and setting these numbers into stone.” People may abuse the data, but we disagree that this is a good reason not to publish the data. We have been systematically collecting project-level development finance data for the better part of the last ten years, and we find errors in the official data all the time. You cannot fix errors until you know that they exist, and we believe that more sunlight and scrutiny is the best way to spot and fix errors. Brautigam’s arguments seem to suggest that only a small group of people who she considers to be experts should be allowed to collect and analyze data on the nature, distribution, and effects of Chinese development finance. We disagree with this “gatekeeper” approach to social science and expect that it will slow progress in this narrow sub-field of the academy.
Fourth, media-based data collection offers some advantages not mentioned in Brautigam’s blog post. During the course of our work, we learned that media-based data collection can help uncover projects that never enter official reporting systems. In Malawi, for example, the Ministry of Finance maintains a database of all incoming aid flows. This database lists only two Chinese-funded projects worth $133 million from 2000 to 2011. AidData’s media-based methodology captured these two projects, but it also uncovered an additional 14 Chinese projects during the same period of time.These 14 projects constitute an additional $164 million. One cannot help but wonder if Malawi’s Minister of Finance would consider these 14 additional projects uncovered through media-based methods “rubbish.”
Fifth, it appears that we have a difference of opinion with Brautigam regarding how knowledge is generated, refined, and improved. Any student in an introductory research methods course learns that a defining feature of the scientific process is the need to expose their methods and data to public scrutiny. If your methods, procedures, and inferences are not transparent and replicable, then you are not making knowledge claims based on scientific principles. Consider the process leading up to the publication of our CGD working paper. Brautigam reviewed an early version of our paper and provided constructive suggestions for improvement, which we in turn used to make improvements to the methodology and the dataset. One of her most valuable suggestions was to systematically collect data on the status of each project in the database based on available sources. Users of the database should be able to determine if a project has been “promised” but no formal legal commitment has been made; or if a commitment has been made, but the project has not yet been implemented or completed; or if a previously pledged or committed project has been cancelled or suspended. Users can now do this because we exposed our data to scrutiny and made improvements to our methodology based on external feedback. Going forward, if we have erred in any individual case or if the media report on which the coding decision was made is in error, we have designed our online database platform at china.aiddata.org so that it is easy for users to bring errors to our attention and request corrections. We acknowledge that our methodology is imperfect and that our database of nearly 1,700 projects may contain some errors, but we also reject the idea that keeping the database from the light of day will somehow improve its accuracy.
We also disagree with Brautigam’s comparison of our methodology to the Wagner School project. Previous media-based data collection efforts have run into major challenges: heavy reliance on individual sources (particularly English language news sources), insufficient attention to duplicate projects, over-counting as a result of not following projects from announcement to implementation, and opaque methods and sources. We took great care to learn from these past mistakes. While initial skepticism of a new media-based data collection initiative is understandable, our project should be judged on its own merits. The appropriate solution should be to judge the current study, and if/when specific errors are identified, critics ought to make those specific mistakes public. This is how most other big data collection projects in the social sciences have improved and refined their methods and their data over time. We encourage all readers to take a closer look at our published methodology , dataset, and working paper before judging its quality in relation to other studies.
Sixth, regarding the “megadeals” that Brautigam dismisses as invalid, we too recognize that many of these deals haven’t actually happened, which we have indicated in the “Status” column of the very table that she cites.This is why we have placed so much emphasis on the “Project Status” variable, and warn users against assuming that all records in our database are created equal. If a particular user of the data only wants to analyze projects that have been implemented or completed, we have included a set of variables that will allow the reader to filter the dataset in this way.
Seventh, Brautigam takes issue with our findings on the top recipients of Chinese official financing, but it seems possible that she is referring to an earlier, unpublished version of our work. Both Angola and Ethiopia are on our current list of top recipients.The working paper lists in order the top recipients of official finance commitments in our database as Ghana, Nigeria, Sudan, Ethiopia, Mauritania, and Angola.
Eighth, Brautigam suggests that our team may not be suited to perform this type of research, stating, “[t]his is not research that can be done by just anyone, and especially not by only looking at media reports.” We agree that experts who dedicate their careers to studying China and Africa should help generate, vet, publish, and analyze data. This could potentially be a great example of what Gary King of Harvard calls for -- a collaboration between social scientists and area experts. Our main priority moving forward is to solicit as much specific input from area experts, practitioners, and journalists to add sources and correct errors resulting from imperfect media reports. We take exception to the notion that media-based research and expert qualitative fieldwork are incompatible methods for studying development finance. Ultimately, it should not matter who is performing the research so long as they can produce useful results. And, in our judgment, the easiest and most effective way to do this is by consolidating data in one place using a common set of rigorous and transparent methods. People who study economic growth rely on Penn World Table. People who study inter-state war rely on the Correlates of War (COW) dataset. People who study democracy rely on the POLITY database. Yet researchers who study the nature, distribution, and impact of Chinese development finance activities have not previously been able to benefit from an analogous database. This is a major impediment not only for researchers seeking to make descriptive or causal inferences, but also for policymakers who want to know where resources are flowing and for what purpose. While we recognize that our database is not yet analogous to these authoritative data sources, we hope that with specific and constructive feedback from experts such as Brautigam, we will be able to create a public good that will enable researchers and policymakers to advance our collective understanding of Chinese development finance.
Finally, Brautigam claims that AidData’s attempt to crowdsource Chinese development finance information will not work. This assertion is a testable hypothesis. With time, others will be able to judge whether we succeed in generating higher-quality and higher-resolution data over time. It is interesting to note that within 24 hours of the china.aiddata.org site being launched, users began to provide new information about specific projects in the database. The Guardian produced field reports on four of the projects in our database (here, here, here, and here).

Going forward, we invite researchers, policymakers, development practitioners, journalists and civil society organizations to join us in an effort to expand and enrich the online database at china.aiddata.org. We also hope that users will engage us and help us to refine the methodology, as it may very well have applications that extend far beyond Chinese development finance to Africa. We think there are some potentially big payoffs here for those who study the activities of DAC and non-DAC donors. We understand that our methodological approach will have its critics, but we hope that people will work with us to improve it. We urge readers to review to actual methodology, rather than caricatures of it.

Tags: media-based data collection (MBDC)China in Africa


Looking at it as a journalist you are in a sticky area and your defensiveness in the above entry is not very helpful to sorting this out! No respectable paper will publish something riddled with errors with the idea that "we'll just run a correction" - Brautigan is right that that simply perpetuates confusion and leads to a lot of misleading information being batted about. So your option is to run a big disclaimer on the top of the site and what exactly would that say and what use would it be? I mean, it'd have to say something like, "Unverified information" and what honestly can anyone do with the data? So, this is probably great for the casual observer of aid flows to Africa, but not so much for the serious researcher.

Dear journalist,<br /><br />With all due respect, political science is quite different from journalism. All data contains caveats, but it is necessary for progress in the field to publish the methods and results. If everything is swept under the rug, no improvements can be made. As the field stands right now, there are no testable hypotheses. Current researchers make assertions and follow them up with "trust me, I have a dataset." We must bring our knowledge to the public forum in order to have an informed and unbiased discussion of the relationship between China and Africa.

Very odd remark from the journalist above. Newspapers seem to be happy to report the first estimates of GDP, even though they are subject to (often very substantial) revision when more information becomes available.

Owen, those were my thoughts too. As the post points out, the official aid data have plenty of errors as well. Crowdsourcing and replicability seems like a nice path toward keeping our analyses a little more honest.

Organizations which hide their methods behind "expert" panels look like they've got better data, but it just isn't so. Consider IMF GDP data in Africa or health data as discussed in Karen Grepin's blog http://bit.ly/fg9J0m. It is time to recognize that crowdsourcing is a complementary source of information (if used right) just like survey data is useful (if collected and used well). And time to recognize that opening methods and even spreadsheets (think Reinhart-Rogoff)is critical to finding mistakes and improving the debate.

I think what is missing here is reflected perfectly in this comment thread. AidData has lost no time in comparing itself to Wikipedia. While I think that is a bit of self-aggrandizement there, there is a lesson we can learn from what came before. We all remember the time 10 years ago when everyone was cautioned at every turn to NEVER cite Wikipedia for ANYTHING. It wasn't because the information in Wikipedia was always inaccurate or poorly sourced, but rather because many users did not yet understand how to use this open resource in a way that would make best use of the most information while keeping quality very high. Now, however, most people are comfortable using a resource like Wikipedia because they understand its limitations, know how to tell where the information came from, and when it is necessary to do further research. <br />We have to remember, the open data movement is new, and the conversation with non-academic and data types around how and when to use open data hasn't happened yet. Did AidData clearly document that these data were preliminary? Sure they did. But did journalists not get that message and publish unverified figures that could be billions of dollars off as fact? Sure they did. Ignore the methodology, ignore the battle of who is an expert, and let's use this experience to start educating data providers on how casual users experience open data, and likewise educate casual users on how they should, and should not, make use of this fantastic new way to generate knowledge.

I find the post odd, however the comments here are pretty helpful. I'm researching about this subject and good thing there are other journalist participating in the comments. :)

This is a subject that I always wanted to go deep but it's not really my thing. I love reading stuff like what you have here and enjoy all these helpful comments :)