If development data is so important, why is it chronically underfinanced?

The share of official development assistance spent on data and statistics was only 0.3 percent in 2015

Few will argue against the idea that data is essential for the design of effective policies. Every international development organisation emphasises the importance of data for development. Nevertheless, raising funds for data-related activities remains a major challenge for development practitioners, particularly for research on techniques for data collection and the development of methodologies to produce quality data.

If we focus on the many challenges of raising funds for microdata collected through surveys, three reasons stand out in particular: the spectrum of difficulties associated with data quality; the problem of quantifying the value of data; and the (un-fun) reality that data is an intermediate input.

Data quality

First things first – survey data quality is hard to define and even harder to measure. Every survey collects new information; it’s often prohibitively expensive to validate this information and so it’s rarely done. The quality of survey data is most often evaluated based on how closely the survey protocol was followed.

The concept of Total Survey Error sets out a universe of factors which condition the likelihood of survey errors (Weisbeg, 2005). These conditioning factors include, among many other things: how well the interviewers are trained; whether the questionnaire was tested and piloted and to what degree; whether the interviewers’ individual profiles could affect the respondent answers, etc. Measuring some of these indicators precisely is effectively impossible—most of the indicators are subjective by nature. It may be even harder to separate the individual effects of these components in the total survey error.

Imagine you are approached with a proposal to conduct a cognitive analysis of your questionnaire. The questionnaire asks: How often were you bothered by a pain in the stomach over the last year? A cognitive psychologist will tell you that this is a badly formulated question: the definition of ‘stomach’ varies drastically among the respondents; last year could be interpreted as last calendar year, 12 months back from now, or from January 1st until now. One respondent might answer: “It hurt like hell, but it did not bother me, I am a Marine…” This is from a seminar by Gordon Willis.

In the cognitive analysis, a team of phycologists and linguists analyse questions to be asked during interviews to determine how well those questions are understood by a typical respondent and to correct questions where respondents from the focus groups have difficulties. This exercise could be expensive and time consuming. And how much would such an analysis be worth? This is different from willingness to pay; the question is: what’s the intrinsic value of the analysis itself? US$30,000? US$300,000? It’s hard to decide because you do not know whether or how such analysis will affect the quality of your data.

Valuating data

And what if I told you that the Total Survey Error could be reduced by 15 percent were you to make several adjustments to the survey instrument? Is this improvement in precision worth the money? To answer these questions, you need to understand the value of your data.

The research community has long been struggling to offer reliable methods to quantify the value of data. Many approaches have been proposed and no consensus yet exists on the best metric. One thing everyone agrees on is that it’s difficult, and more investment to crack this nut is needed (Slotin, 2017). Most researchers focus on the estimation of the aggregate indicators, such as value of data for a country or an industry, and very few try assessing returns on investment for a particular survey. The problem might be especially acute for the large, multi-topic surveys that contain several hundred questions about various aspects of household well-being. The results of these surveys might be used by many agencies and researchers to design a wide range of policy interventions. Various policies would rely on quantitative analysis to a different degree and there will be a large heterogeneity in the importance of quality data for these policies.

Data as an intermediate input

Okay, now suppose that your survey has the well-defined, narrow focus of identifying people with abdominal pain to assist them in getting a treatment. At first glance, this case looks trivial. You calculate leakages and under-coverage of the programme due to an error and compare these losses with the cost of achieving higher precision.

However, the outcome of any programme is a result of complex interactions of multiple components. The effectiveness of the programme depends not only on how well we can identify the recipients, but also on the efficiency of the distribution mechanism, the qualification of people administering the programme, the willingness of the recipients to accept the treatment, political support for the programme, and a range of other factors. A priori, it is difficult to say whether you should invest more in collecting better data or, for example, in the improvement in the administrative system. In other words, data itself is not an output, it is an input into the production of outputs that organisations or people are interested in and are willing to pay for.

A product everyone agrees is crucial but few are willing to finance

So here we are: the value and quality of survey data are hard to measure and even the highest quality survey data are an intermediate input into the production of downstream outputs which often only indirectly depend on the data itself. To all of this we should add the ubiquitous free rider problem associated with the generation of data as a global public good, and we find ourselves in a weird place: a product everyone agrees is crucial, but few are willing to finance.

Development data (done right) is expensive. The need for broader and more consistent investment in development data is significant (and growing) and long-term returns on investments in data could be large. However, with national governments in low and middle-income countries constantly weighing crucial development priorities (in which data needs are often underfunded or entirely crowded out), with virtually zero private sector interest in investing in development data, and with the current reality in which a handful of international organisations are essentially subsidising the planet’s development data needs, the outlook for development data may not be great.

It may seem natural to conclude that the current handful of international development organisations should continue to play a leading role in supporting research and methodological improvements in data production and sponsor large data collection programmes in poorer countries. However, the share of official development assistance spent on data and statistics was only 0.3 percent in 2015 (Paris 21, 2017).

It remains to be seen if this level of financing reveals the actual demand for micro-data, or if it indicates the scale of the collective action problem we face.

This article appeared originally on the World Bank blogs site.

About the Author

Michael Lokshin

Michael Lokshin is a manager and lead economist in the Development Data Group of the World Bank. He received his Ph.D. in Economics from the University of North Carolina at Chapel Hill in 1999 and later joined the research group at the World Bank. His work focuses on the areas of poverty and inequality measurement, labor economics, and applied econometrics.