A 5 Minute Dive Into Our New Geocoded World Bank Data
In the spring of 2010, AidData and the World Bank Institute embarked on an ambitious effort to shed new light on investments to end poverty in over 70 countries. Over the course of 6 weeks, AidData and the Mapping for Results project combed through copious documentation for 1,200 World Bank funded projects, ultimately geocoding approximately 12,000 locations of development activities worldwide. Since then, AidData has continued our efforts to dig through project-level documents and provide the most precise, geocoded data possible. Recently, we released a new dataset that provides the subnational locations of over 3,500 World Bank projects (in the IBRD/IDA lines) approved between 2000 and 2011. These projects were geocoded to 41,307 locations using our standardized geocoding methodology and account for nearly $370 billion in committed funds.
This dataset offers an exciting opportunity for researchers and policymakers to better understand the impact and allocation of World Bank projects. For example, there are 114 projects related to primary education (for which primary education was the predominant sector classified by the World Bank) spread across 67 countries. Our team geocoded these projects to 1,691 distinct locations, roughly 70% of which are within-district. One could study the impact these projects have had on educational outcomes in nearby areas. The chart below provides the distribution of project locations for all AidData sectors and by level of precision.
Of course, researchers assessing the impact and allocation of aid are often concerned about selection bias, in which projects are located in particularly advantageous (or disadvantageous) locations. This dataset lends itself particularly well to quasi-experimental designs that address this bias. Because many of the locations are precisely identified, one can control for common characteristics shared by locations through the use of fixed effects and location-specific trends. Matching locations with nearby projects to those without active projects may offer an alternative strategy to limit selection bias.
Country borders or subnational administrative unit boundaries may also offer a useful spatial discontinuity across which outcomes might otherwise evolve similarly. Because the date of the projects’ activity is explicitly included in the data, one can further compare variables of interest before a given project’s launch with conditions after its launch. Moreover, because the data capture projects approved over a 12 year window (Jan. 1 2000 to Dec. 31 2011), one can compare locations in which projects were launched relatively earlier to those where projects were launched relatively later. Finally, the data provide sufficient time lag through which real outcomes—particularly for the earliest set of projects—should be observable.
We encourage researchers and other users to merge this data with other geo-referenced datasets such as the World Bank Living Standards Measurement Surveys, Demographic and Health Surveys, Afrobarometer, and remotely sensed data. Users will find World Bank operational and evaluation results data already included in the download, in addition to metadata and an explanation of the geocoding methodology.
This is a Level 1 data product, which means that these data have undergone a quality assurance process to ensure intra-field consistency, remove duplicate projects/locations, and correct geographic inaccuracies. As you use the data, we invite you to make suggestions for improvement in the next iteration of the data. Responses should be sent to firstname.lastname@example.org and will be collected through the third quarter of 2015. Feasible suggestions will be incorporated into the next iteration of the data (v1.1) which will be made available at the end of 2015.
Ariel BenYishay is AidData’s Chief Social Scientist based at the College of William and Mary. Jessica Hogstrom is AidData’s Research Manager. Rachel Trichler is AidData’s Monitoring and Evaluation Specialist.