Predicting Population-Level Socio-Economic Characteristics Using Call Detail Records (CDRS) in Sri Lanka
18 Pages Posted: 25 Oct 2017 Last revised: 9 Feb 2018
Date Written: October 25, 2017
The availability of accurate, timely, disaggregated, and comparable socio-economic data is crucial for effective policymaking, especially with regard to economic development and resource allocation. Spatially-granular demographic data are often collected through the decennial national census, and population-level socio-economic characteristics are often captured more frequently, through representative surveys such as the Household Income and Expenditure Survey (HIES). In Sri Lanka, the HIES is conducted once every three years, and is representative only up to the District level, the second-level administrative unit. The census and surveys are expensive and time-consuming to conduct and, in the context of Sri Lanka, are not frequent enough to capture the changing dynamics of a fast-moving economy, especially one recovering from civil conflict. Similarly, other developing countries grapple with a lack a poverty data (Serajuddin et al. 2015). Our research seeks to determine the opportunity for mobile phone data to provide a reliable, cheap proxy for census data within Sri Lanka, especially in post-conflict regions that have a greater need for frequent data collection.
Mobile phone meta-data such as Call Detail Records (CDRs) can broadly describe three dimensions of human behavior: social networks, consumption activity, and mobility (UN Global Pulse, 2013). CDRs are passively collected by the mobile network whenever a subscriber uses the mobile phone to make or receive a phone call, send or receive a text, or when initiating a data session. A CDR that is generated by mobile phones yields new types of data – such as spatially disaggregated data at micro-regional levels (e.g. the household level) – which could provide novel opportunities for targeted policy design, implementation, and evaluation. This, coupled with the near-ubiquitous adoption of mobile phones in developing countries, presents opportunities to leverage such data sources to complement traditional statistics in the intervals between official surveys. If CDRs can accurately predict Sri Lankan socio-economic characteristics, policymakers will have access to a wealth of reliable, timely data on which to base policy.
Our paper seeks to identify relationships between features derived from Sri Lankan census and CDR data, replicating methods used by Frias-Martinez and Virseda (2012) in relation to an unidentified Latin American country. We use the 2011/12 Sri Lanka census data and CDR data for approximately 600,000 mobile phone subscribers from Sri Lanka’s Northern province, which is a post-conflict region. We seek to answer two questions:
1) What relationships, if any, exist between Sri Lankan census and CDR data, and do these provide an opportunity for predictive models?
2) Are methods developed for census feature prediction in other countries applicable within a Sri Lankan context, especially in the regions severely affected by conflict?
Suggested Citation: Suggested Citation