Measuring Housing Activeness from Multi-Source Big Data and Machine Learning

38 Pages Posted: 11 Oct 2021

See all articles by Yang Zhou

Yang Zhou

Institute for Big Data, Fudan University

Lirong Xue

Princeton University - Department of Operations Research & Financial Engineering (ORFE)

Zhengyu Shi

Fudan University - School of Data Science

Libo Wu

Fudan University - School of Economics

Jianqing Fan

Princeton University - Bendheim Center for Finance

Date Written: October 11, 2021

Abstract

Measuring timely high-resolution socioeconomic outcomes is critical for policy making and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This paper demonstrates an adaptive way to measure the time trend and spatial distribution of housing activeness with the help of multiple easily accessible datasets. We first identified the regional activeness status from energy consumption data and then matched it with nightlight and land use data. We introduce the factor-adjusted regularization methods for prediction (FarmPredict) to deal with dependence and collinearity issues among predictors by effectively lifting the prediction space. It applies to all machine learning algorithms. The heterogeneity of big data is mitigated through the land-use data. FarmPredict allows us to extend the regional results to the city level, with a 75% out-of-sample explanation of the spatial and timeliness variation in the house usage. Since energy is indispensable for life, our method is highly transferable with only the requirement of publicly accessible data. Our paper demonstrates the power of machine learning in understanding socioeconomic outcomes when the census and survey data are costly or unavailable.

Keywords: Housing Activeness, Machine Learning, Factor Model, FarmPredict, Computational Social Science

JEL Classification: C02, C53, C55, R21, R31,

Suggested Citation

Zhou, Yang and Xue, Lirong and Shi, Zhengyu and Wu, Libo and Fan, Jianqing, Measuring Housing Activeness from Multi-Source Big Data and Machine Learning (October 11, 2021). Available at SSRN: https://ssrn.com/abstract=3940180 or http://dx.doi.org/10.2139/ssrn.3940180

Yang Zhou (Contact Author)

Institute for Big Data, Fudan University ( email )

No. 220 Handan Road
Shanghai, 200433
China

Lirong Xue

Princeton University - Department of Operations Research & Financial Engineering (ORFE) ( email )

Sherrerd Hall, Charlton Street
Princeton, NJ 08544
United States

Zhengyu Shi

Fudan University - School of Data Science ( email )

#220 HanDan Road
Shanghai, Shanghai 200433
China

Libo Wu

Fudan University - School of Economics ( email )

Shanghai
China

Jianqing Fan

Princeton University - Bendheim Center for Finance ( email )

26 Prospect Avenue
Princeton, NJ 08540
United States
609-258-7924 (Phone)
609-258-8551 (Fax)

HOME PAGE: http://orfe.princeton.edu/~jqfan/

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
296
Abstract Views
1,179
Rank
219,402
PlumX Metrics