AidData launches new Geospatial Global Chinese Development Finance Dataset

The dataset covers 9,000+ projects worth $830 billion and accompanies the publication of an article in one of Nature’s journals, Scientific Data.

June 11, 2024
Alex Wooley
Image by Sarina Patterson. Map insets of projects produced by AidData from china.aiddata.org. Background satellite imagery from Blue Marble: Next Generation produced by Reto Stöckli, NASA Earth Observatory, accessed at https://worldview.earthdata.nasa.gov/.

Image by Sarina Patterson. Map insets of projects produced by AidData from china.aiddata.org. Background satellite imagery from Blue Marble: Next Generation produced by Reto Stöckli, NASA Earth Observatory, accessed at https://worldview.earthdata.nasa.gov/.

Today, a team of AidData researchers launched a new dataset containing geospatial features of 9,405 projects across 148 low- and middle-income countries supported by Chinese grant and loan commitments worth more than $830 billion. The release of AidData’s Geospatial Global Chinese Development Finance Dataset, Version 3.0 (“Geo-GCDF v3”) accompanies the simultaneous publication of an article in one of Nature’s prestigious online journals, Scientific Data

The dataset is a companion to AidData’s Global Chinese Development Finance Dataset, Version 3.0 (“GCDF v3”) released in November 2023. “Our new dataset provides unprecedented insight into the precise locations of Chinese-financed projects around the world,” said Seth Goodman, AidData Research Scientist and lead author of the Scientific Data article. “By making our data freely and publicly available, researchers worldwide will be better able to measure the localized effects of China’s overseas development projects across a range of sectors and issues, including agricultural productivity, household welfare, economic growth, nutrition, infant mortality, environmental degradation, corruption, gender equality, civic engagement, and violent conflict, to name just a few.”

Nature’s Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically-important datasets and research that advances the sharing and reuse of scientific data. It is highly influential, with some 6.4 million articles downloaded in 2023 alone.

The new Geo-GCDF v3 dataset includes precise spatial definitions of 6,266 projects representing the exact physical features of roads, railways, transmission lines, buildings, and other infrastructure. Precisely geocoded projects can include the routes tracing the paths of roads, railways, and transmission lines, or the outlines and footprints of buildings associated with dams, bridges, mining operations, hospitals, stadiums, and more.

“The combination of Geo-GCDF v3 and GCDF v3 is a powerful one,” said AidData’s Executive Director Bradley C. Parks, “because it creates opportunities to address new questions and revisit old ones about the intended and unintended impacts of China’s grant- and loan-financed projects in the developing world. For example, in previous versions of the GCDF dataset, all of the provinces or districts that intersected with the route of a road project may have been identified, but not the precise route between the road’s start point and end point. This measurement imprecision limited opportunities for rigorous impact evaluation, so we invested considerable effort to build a geospatial version of GCDF v3 that provides spatial information on more than 9,000 projects which have physical footprints or involve specific locations, by extracting point, polygon, and line vector data.”

Five AidData scientists and analysts—Goodman, Sheng Zhang, Ammar Malik, Parks, and Jacob Hall—wrote the paper, with a total of 24 AidData faculty, staff and research assistants spending approximately 2,400 hours assembling the dataset. The methodology, dataset, and the code used to construct the dataset have been made publicly available through GitHub to facilitate replication and future applications.

How’d they do it?

“The initial step of our data collection process involves identifying the subset of Chinese grant- and loan-financed projects that (a) support the construction, rehabilitation, upgrading, maintenance, expansion, or preservation of physical assets with identifiable geographical features, and/or (b) support activities which take place at specific locations with identifiable geographical features,” said Ammar Malik, a Senior Research Scientist and AidData’s Director of Tracking Underreported Financial Flows. 

The purpose of this step is to identify the ultimate geographical destinations of Chinese aid and credit. Examples of (a) include roads, railways, airports, seaports, power plants, electricity transmission lines, industrial parks, schools, hospitals, stadiums, and museums, while (b) might include medical teams stationed at a given hospital or equipment given to park rangers to patrol.  Projects with no geospatial information available through project documentation or other sources—or projects without specific locational destinations—are not processed in the geospatial data collection process. 

“To compile project locations,” said Sheng Zhang, Research Analyst, “we leverage geospatial features defined by OpenStreetMap (OSM), which is a free, editable geographic database of the world built by community collaborators. In addition to utilizing existing features from the extensive data available in OSM, we contribute updates or additions for features that reflect project activities when we are able to do so.”

The new dataset is also included in AidData’s free geospatial data platform, GeoQuery, a project led by Goodman and Hall. “The data in Geo-GCDF v3 can be integrated with hundreds of additional geospatial variables (e.g., land cover, nighttime lights, population density), which allows for time series analysis of trends like deforestation and economic activity,” said Jacob Hall, Data Analyst. “Prior to GeoQuery, similar kinds of analysis would have required extensive GIS knowledge, data processing, and computational resources to prepare the associated datasets from satellite imagery and other sources.”

The Geo-GCDF v3 dataset is compatible with most software and tools that support standard geospatial data formats. Desktop GIS software, such as the open-source QGIS platform or ESRI’s ArcGIS Pro, support a broad range of mapping, analysis, and other applications. Web-based platforms, such as Mapbox and ArcGIS Online, are also useful for sharing and visualizing outputs.

Alex Wooley is AidData's Director of Partnerships and Communications.