Data-Driven Measures of High-Frequency Trading
78 Pages Posted: 14 May 2024 Last revised: 17 Mar 2025
Date Written: April 1, 2024
Abstract
High-frequency trading (HFT) accounts for almost half of equity trading volume, yet it is not identified in public data. We develop novel data-driven measures of HFT activity that separate strategies that supply and demand liquidity. We train machine learning models to predict HFT activity observed in a proprietary dataset using concurrent public intraday data. Once trained on the dataset, these models generate HFT measures for the entire U.S. stock universe from 2010 to 2023. Our measures outperform conventional proxies, which struggle to capture HFT’s time dynamics. We further validate them using shocks to HFT activity, including latency arbitrage, exchange speed bumps, and data feed upgrades. Finally, our measures reveal how HFT affects fundamental information acquisition. Liquidity-supplying HFTs improve price discovery around earnings announcements while liquidity-demanding strategies impede it.
Keywords: High-frequency trading, machine learning, latency arbitrage, information acquisition, liquidity
JEL Classification: G10, G12, G14
Suggested Citation: Suggested Citation