The Maturing of the IATI Data Standard: Ensuring Quality in a Highly Networked Environment

Every open data community experiences growing pains, and the International Aid Transparency Initiative (IATI) is no exception. In its first few years of existence, IATI has achieved massive growth.  The list of IATI publishers has grown from a few core founders to almost the entire official development assistance community, as well as dozens of smaller non-governmental organizations.

Like any transformative initiative, IATI has charted new territory for its industry. As part of this, early on, the complexity of the IATI data model created an impediment to broad IATI adoption. However, this was no accident - as one of the IATI standard’s early technical architects, David Megginson, recently wrote:

The really hard part of dealing with IATI data…isn’t the accidental complexity of the format (CSV vs. JSON vs. XML), but the essential complexity of the data itself. Transaction traceability, for example, is hard, as is getting different donors to align on the same code sets, fine-grained reporting cycles, forward-looking financial data, etc. There’s no format that will make this stuff easy.

As the development industry embraced the IATI standard, it made a significant step forward in moving away from aggregate reporting (e.g., “country X gave $3 million dollars in funding to country Y in 2011”) and towards detailed, disaggregated, project-level data. Yet, this shift soon revealed an underlying tension between the need to represent each project as accurately as possible and the need to implement some data standardization to facilitate comparability. By necessity, those publishing aid information have had to navigate a learning curve in devising a publishing strategy that reconciles these two goals. While IATI data quality overall is improving rapidly, the IATI data registry itself still bears the scars of some of this early learning.

When the AidData team began working on the beta import of IATI data into the aiddata.org platform, we faced many of these challenges head-on. Seeking to ensure the reliability of AidData’s development finance information for those who download and analyze our data, we conducted some quality verification and validation, joining the growing community of organizations working not just on IATI data publishing but on data quality assurance (another great example here).

What does this quality assurance process look like in practice? Let’s explore one example. Every major development organization in the IATI registry has a unique ID code. This code helps users aggregate the funding activities of a particular development organization accurately. The UK’s Department for International Development (DfID), for instance, should look like this:

<org ref=”GB-1”>Department for International Development</org> [1]

But what happens if it looks like this:

<org ref="DFID">DFID</org>

Perhaps this is clear to a human reader, but it is considerably less clear to a computer which is automatically parsing 300,000 records. (NOTE: Because of the linked nature of IATI data, DfID is not the only organization publishing data about DfID - none of these publishing errors came from DfID itself.) Or what about these:

<org ref="13000">DFID</org>

<org ref="DFID">DFID</org>

<org ref="LIVINGEARTH">DFID</org>

<org ref="">DFID</org>

<org ref="LIVINGEARTH">DFID - British Government</org>

<org ref="">DfID - UK Aid</org>

<org ref="">DfID (CSCF)</org>

<org ref="">DFID and BRAC International</org>

<org ref="">DFID Dep. for Int. Development</org>

<org ref="DFID">DFID TB grant</org>

<org ref="">DFID.</org>

<org ref="DFID"></org>

<org ref="Unknown">DFID</org>

<org ref="">DFID</org>

<org ref="">DFID (The Department for International Development)</org>

<org ref="DFID">DFID CSCF</org>

<org ref="">UKaid from the Department for International Development</org>

<org ref="">UKaid from the Department for International Development - Governance and Transparency</org>

<org ref="">UKaid from the Department for International Development - Governance and Transparency Fund</org>

<org ref="">UKaid from the Department for International Development - Match Funding Big Dig</org>

<org ref="">UKaid from the Department for International Development - Programme Partnership Agreement</org>

<org ref="">UKaid from the Department for International Development - Strategic Grant Agreement</org>

<org ref="">Ukiah from the Department for International Development - Programme Partnership Agreement</org>

<org ref="">UKAID</org>

<org ref="DFID GB-1">DFID</org>

<org ref="GB">DFID</org>

<org ref="">Department for International Development</org>

<org ref="DFID">Department For International Development (DFID)</org>

<org ref="">Department for International Development (DFID)</org>

<org ref="22000">DFD</org>

<org ref="IMP-01-PL-0153">DFID</org>

<org ref="IMP-01-PL-0153">DFID Grant</org>

<org ref="">Funds from DFID</org>

<org ref="">Department for International Development (DFID)</org>

<org ref="">DEPARTMENT FOR INT\L DEVELOPMENT</org>

<org ref="GPAF-IMP-067">DFID</org>

<org ref="GPAF-IMP-067">DFID and SCIAF</org>

All of these distinct elements represent the same organization (we hope). In order to do comprehensive network analysis and aggregation with IATI data, it is important to unify them into one entity. As part or our beta aiddata.org release effort, AidData has begun some initial steps of this unification process. To date, we have reconciled several thousand IATI organization references and other IATI codes. As of today, we have released all of these mappings openly on AidData’s Github account. We invite other IATI data users to use these codes, and we welcome pull requests to improve the list and correct inevitable errors.

IATI data publication is a decentralized system, and the value of IATI data is an emergent property of that system. With consistent use of organizational identifiers and good data standards, IATI will yield increasing returns to scale as more publishers share data. Conversely, if we have inconsistent data quality, the network risks becoming less than the sum of its parts, as data become increasingly incoherent and difficult to reconcile. We hope that this new set of mappings provides part of the foundation for a discussion on how better data linkages can improve the IATI data ecosystem as a whole.

Right now, we are on the cusp of what we hope is the next leap forward for the IATI data standard. Most IATI data is of high quality and adheres to consistent standards. Publishers who have made mistakes in the past are also in the process of improving their internal processes. At AidData, we look at these field mappings as an interim solution - a way to highlight the value-add of creating networked data models and of accurately showing links between projects and organizations. We hope that this sparks discussion and helps the international development open data community further appreciate the networked benefits of reporting and sharing data by using a common standard.

Owen Scott is an AidData Project Manager.

[1] IATI purists will notice that, for illustrative purposes, I've conceptually merged <reporting-org>, <participating-org>, <provider-org>, and <receiver-org> into one meta element. In reality we drew from all of these elements when doing our organization mapping work.

Tags: iatiopen dataIATI registryDFID