A novel data fusion method to leverage passively-collected mobility data in generating spatially-heterogeneous synthetic population

30 Pages Posted: 21 Nov 2023

See all articles by Khoa Vo

Khoa Vo

National University of Singapore (NUS)

Eui-Jin Kim

Ajou University

Prateek Bansal

National University of Singapore (NUS)

Date Written: August 6, 2023

Abstract

Conventional methods to synthesize population use household travel survey (HTS) data. They generate many infeasible attribute values due to sequentially generating sociodemographics and spatial attributes and encounter a low spatial heterogeneity issue due to a low sampling rate of the HTS data. Passively collected mobility (PCM) data (e.g., cellular traces) provides extensive spatial coverage but poses integration challenges with HTS data due to differences in spatial resolution and attributes. This study introduces a novel cluster-based data fusion method to address these limitations and simultaneously generate synthetic populations with accurate sociodemographics and homework locations at high spatial heterogeneity. Spatial clustering is adopted to align the spatial resolution of HTS and PCM data, facilitating effective data integration. The data fusion process is reformulated into cluster-specific low-dimensional optimization subproblems to ensure computational tractability. Analytical properties are derived to retain essential distributional characteristics from both datasets in the fused distribution. The spatial clustering process is optimized to ensure such distributional consistencies while maintaining a balance between feasibility and heterogeneity of the synthetic population. The data fusion properties are validated using HTS and LTE/5G cellular signaling data from Seoul, South Korea. Validation against census data confirms the method's efficacy in maintaining distributional consistency while increasing spatial heterogeneity, with 97% of the generated population being unobserved in the HTS data. This research advances methods to synthesize a population by leveraging the complementary strengths of HTS and PCM data, providing a robust framework for generating spatially diverse synthetic populations essential for urban planning.

Keywords: Population synthesis, Data fusion, Spatial heterogeneity, Passively collected mobility data, Cellphone data

Suggested Citation

Vo, Khoa and Kim, Eui-Jin and Bansal, Prateek, A novel data fusion method to leverage passively-collected mobility data in generating spatially-heterogeneous synthetic population (August 6, 2023). Available at SSRN: https://ssrn.com/abstract=4612180 or http://dx.doi.org/10.2139/ssrn.4612180

Khoa Vo

National University of Singapore (NUS) ( email )

Singapore
Singapore

Eui-Jin Kim

Ajou University ( email )

Woncheon-dong, Yeongtong-gu
Suwon-si, Gyeonggi-do
Korea, Republic of (South Korea)

Prateek Bansal (Contact Author)

National University of Singapore (NUS) ( email )

1E Kent Ridge Road
NUHS Tower Block Level 7
Singapore, 119228
Singapore

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
144
Abstract Views
625
Rank
387,411
PlumX Metrics