Stvanet: A Spatio-Temporal Visual Attention Framework with Large Kernel Attention Mechanism for Citywide Traffic Dynamics Prediction

Yang, Hongtai; Jiang, Junbo; Zhao, Zhan; Pan, Renbin

doi:10.2139/ssrn.4673691

Download This Paper

Open PDF in Browser

Add Paper to My Library

Stvanet: A Spatio-Temporal Visual Attention Framework with Large Kernel Attention Mechanism for Citywide Traffic Dynamics Prediction

21 Pages Posted: 22 Dec 2023

See all articles by Hongtai Yang

Renbin Pan

Southwestern University of Finance and Economics (SWUFE)

Abstract

Enhancing the efficiency and safety of the Intelligent Transportation System requires effective modeling and prediction of citywide traffic dynamics. Most studies employ convolutional neural networks (CNNs) with a 3D convolutional structure or spatio-temporal models with self-attention mechanisms to capture the spatio-temporal information of traffic distribution. Although 3D CNNs excel at capturing local contextual information, they are computationally complex due to the large number of parameters and cannot capture long-range dependence. By contrast, although self-attention mechanisms originally designed to address challenges in natural language processing can capture long-range dependence, their application to 2D image structures requires breaking down the inherent 2D context into a 1D sequence, increasing the computational complexity and neglecting the adaptability between local contextual information and channels. Accordingly, we propose a spatio-temporal visual attention neural network (STVANet), a novel spatio-temporal visual attention 2D CNN, which integrates a unique visual attention module with a large kernel attention (LKA) mechanism and a feedforward component to capture long-range dependence and channel information in urban traffic data while preserving the 2D image structure. LKA-based spatio-temporal attention networks extract spatial and temporal features from weekly, daily, and recent hourly periods, and aggregate them with weighted consideration of external features to make predictions. Evaluation of real-world datasets demonstrates STVANet’s superiority over baseline models, showcasing its potential in citywide traffic prediction.

Keywords: Traffic Information, 2D ConvNets, Spatio-temporal Data, Large Kernel Attention, Deep Learning

Suggested Citation: Suggested Citation

Yang, Hongtai and Jiang, Junbo and Zhao, Zhan and Pan, Renbin, Stvanet: A Spatio-Temporal Visual Attention Framework with Large Kernel Attention Mechanism for Citywide Traffic Dynamics Prediction. Available at SSRN: https://ssrn.com/abstract=4673691 or http://dx.doi.org/10.2139/ssrn.4673691