A Lightweight Multi-Head Attention Transformer for Stock Price Forecasting
26 Pages Posted: 1 Mar 2024 Last revised: 16 Nov 2024
Date Written: December 1, 2023
Abstract
Despite the potential growth and implementation of AI in live trading machines of the financial industry, researchers face challenges in understanding and foreseeing the chaotic nature of stock prices. Hence, this research proposes a distinctive lightweight Transformer model with sustainable architecture consisting mainly of positional encoding and advanced training techniques to mitigate model overfitting, hence offering prompt forecasting results through a univariate approach on the closing price of stocks. Employing MSE for loss alongside MAE and RMSE as core evaluation metrics, the proposed Transformer consistently surpasses renowned time series analysis models such as SVR, LSTM, CNN-LSTM, and CNN-BiLSTM, averaging a reduction in forecasting errors by over 50%. After being trained across AMZN, INTC, CSCO, and IBM 20-year daily stock datasets, the Transformer demonstrates a high degree of accuracy in capturing flash crashes, cyclical or seasonal patterns, and long-term dependencies inherent in tech stocks. Moreover, it only takes the model 19.36 seconds to generate forecasting results on a non-high-end local machine, fitting into the 1-minute trading window. To our knowledge, this lightweight approach is unparalleled in stock price forecasting.
Keywords: Lightweight Transformer, Positional Encoding, Univariate Approach, Time Series, Flash Crashes
JEL Classification: C45, C53, C58, G17, G11
Suggested Citation: Suggested Citation