Stdcformer: A Transformer-Based Model with a Spatial-Temporal Causal De-Confounding Strategy for Crowd Flow Prediction

28 Pages Posted: 4 Dec 2024

See all articles by Silu He

Silu He

Central South University

Peng Shen

Central South University

Pingzhen Xu

Central South University

Qinyao Luo

Central South University

Haifeng Li

Central South University

Abstract

Crowd Flow Prediction is critical to urban management, with the goal of capturing the arrival and departure characteristics of crowd movements under different spatial and temporal distributions, which is fundamentally a spatial-temporal prediction task. Existing works typically treat spatial-temporal prediction as the task of learning a function F to transform historical observations to future observations. We further decompose this cross-time transformation into three processes: (1) Encoding (E): learning the intrinsic representation of observations, (2) Cross-Time Mapping (M): transforming past representations into future representations, and (3) Decoding (D): reconstructing future observations from the future representations. From this perspective, spatial-temporal prediction can be viewed as learning F = E ・ M・ D, which includes learning the space transformations {E, D} between the observation space and the hidden representation space, as well as the spatial-temporal mapping M from future states to past states within the representation space. This leads to two key questions: Q1: What kind of representation space allows for mapping the past to the future? Q2: How to achieve map the past to the future within the representation space? To address Q1, we propose a Spatial-Temporal Backdoor Adjustment strategy, which learns a Spatial-Temporal De-Confounded (STDC) representation space and estimates the de-confounding causal effect of historical data on future data. This causal relationship we captured serves as the foundation for subsequent spatial-temporal mapping. To address Q2, we design a Spatial-Temporal Embedding (STE) that fuses the information of temporal and spatial confounders, capturing the intrinsic spatial-temporal characteristics of the representations. Additionally, we introduce a Cross-Time Attention mechanism, which queries the attention between the future and the past to guide spatial-temporal mapping. Finally, we integrate the process of learning the STDC representation space and the spatial-temporal mapping into an E-M-D skeleton for spatial-temporal prediction. The skeleton is further instantiated with a Transformer model, building a Transformer model with Spatial-Temporal De-Confounding Strategy (STDCformer). Experiments on two real-world datasets demonstrate that STDCformer achieves state-of-the-art predictive performance and exhibits stronger out-of-distribution generalization capabilities.

Keywords: Crowd Flow Prediction, Causal Inference, Spatial-temporal Transformer, Causal De-Confounding, Cross-Time Mapping

Suggested Citation

He, Silu and Shen, Peng and Xu, Pingzhen and Luo, Qinyao and Li, Haifeng, Stdcformer: A Transformer-Based Model with a Spatial-Temporal Causal De-Confounding Strategy for Crowd Flow Prediction. Available at SSRN: https://ssrn.com/abstract=5044406 or http://dx.doi.org/10.2139/ssrn.5044406

Silu He

Central South University ( email )

Changsha, 410083
China

Peng Shen

Central South University ( email )

Changsha, 410083
China

Pingzhen Xu

Central South University ( email )

Changsha, 410083
China

Qinyao Luo

Central South University ( email )

Changsha, 410083
China

Haifeng Li (Contact Author)

Central South University ( email )

Changsha, 410083
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
29
Abstract Views
174
PlumX Metrics