Forests, a crucial public good, are being depleted at an alarming rate. Each Earth Day (April 22nd this year), we expect to see such eye-popping statistics trotted out as one football field worth of forest being lost every second.
What’s less well known is the role that local development plays in slowing or hastening this environmental degradation in the developing world. The body of evidence so far is mixed, but answering the question is of paramount importance to the governments, donors, and other international actors who want to spur development in the poorest parts of the globe, without sacrificing the local environment.
A new study, published by AidData researchers in the Journal of the Association of Environmental and Resource Economists (JAERE), helps answer that question. It evaluates how a massive local infrastructure program in Cambodia impacted deforestation, finding that the projects did not lead to environmental degradation. Rather, certain types of local infrastructure—such as irrigation and village roads—actually reduced pressure on nearby forests, contrary to what the research team expected.
The researchers used high-resolution satellite imagery and other data to study an intervention—40,000 local infrastructure projects—that was widely dispersed over large portions of Cambodia. A previous evaluation by AidData had found the program (the Commune Sangkat Fund) had led to socioeconomic gains, but the impact on the environment was still unknown.
Impact evaluations of development projects typically focus on assessing how conditions change across entire districts or provinces. But for this study, researchers were able to zoom in much closer to model how forest health changed down to the level of grid cell units, centered around individual infrastructure projects, that were only a single square kilometer in size. Doing this allowed the research team to examine how forest conditions changed before and after every single project took place—but it required an enormous volume of granular data.
As a result, the project had to overcome dramatic data management challenges. The researchers found they needed to build and process their datasets in bits and pieces, as SciClone, the high-performance computing cluster at William & Mary where AidData is located, did not have enough computational power to process the whole dataset at once.
“We were using such huge volumes of data so finely measured, that it actually exhausted the resources of a university’s supercomputer, so we had to split the dataset into chunks,” explained Christian Baehr, lead author on the study and a Junior Data Analyst at AidData. The opportunity to use the SciClone computing cluster was invaluable, as the computations would have been prohibitively expensive if they were performed on commercially available cloud computing platforms.
“We know of few other evaluations like this that incorporate 40,000-plus projects, especially in a single country,” said Baehr. “We were able to pinpoint exactly where these local infrastructure projects were happening, down to a single kilometer. That is really good resolution, which meant our data was really accurate—but also huge in terms of the amount of information that needed to be prepared for analysis,” Baehr emphasized. “Between precisely locating these projects at the 1km level on the one hand, and using outcome data on forest health measured at the 30m level on the other hand, processing this amount of data was a challenge!”
The researchers built on methods used in earlier AidData geospatial impact evaluations of infrastructure’s impact on biodiversity and deforestation. These evaluations similarly use daytime and nighttime satellite imagery and derived measures, which previous research has demonstrated is a reliable proxy for key outcomes like forest loss and forest density. This new study incorporated nighttime lights data from the VIIRS satellite instrument as a proxy for economic activity and Normalized Difference Vegetation Index, or NDVI, data as a proxy for forest health. NDVI provides a standard measure of green spectral light captured by NASA satellites.
(An aside: If you’re looking for geospatial data, many of the exact datasets used in these studies are available to anyone for free via GeoQuery, AidData’s platform for finding and merging together terabytes of geospatial data without writing a line of code).
This was also one of AidData’s first geospatial impact evaluations to use Hansen’s forest data on tree canopy cover. This dataset is a processed version of NDVI that translates the “greenness” of spectral light that is captured by satellites into the percentage of an area that is covered by forests.
Baehr explained it like this: “The way it works is, let’s say you take a static satellite picture in 2000 of the state of forests. For billions of squares around the world, this dataset specifies the percentage of each square which is forested (0-100%). We converted this to a binary measure, where if a square was more than 25% forested, we classified it as “forest.” That way, we had a baseline of what was forest and what was not a forest in the year 2000.”
“The Hansen dataset also has a variable measuring forest loss, which enabled us to answer the question, did a particular 30m square (a grid cell) shift from forest to non-forest, and if it did, in what year? From there, we applied our geospatial impact evaluation methodology, where each grid cell can serve as its own counterfactual based on the timing of when an infrastructure project occurred,” Baehr continued. “This lets us model what we call cell-level fixed effects. In other words, would one of these 30m squares be classified differently if the project had not occurred when it did? Without incorporating this level of rigor to our statistical modeling, we would not have been able to identify the difference to forest health made by an irrigation project or a village road.“
Although the data modeling and processing challenges in this type of analysis were arduous, the ability to incorporate large-scale, fine-grained geospatial data allowed the research team to identify strong patterns which provided the evidence for their conclusions. “The quality of the data is so high that we didn’t think, “Oh, we’re not quite sure about the impacts of roads”—rather, we are pretty confident that the road projects led to no effects on forests, except in the case where they were very close to the dense forests,” concluded Baehr. “Similarly, we also didn’t expect to find such strong evidence in favor of irrigation investments being beneficial to nearby forests, which caused us to reconsider some of our prior hypotheses.”