Difficulties in obtaining a representative sample of retail trades from public data sources
59 Pages Posted: 18 Oct 2023 Last revised: 13 May 2024
Date Written: September 21, 2023
Abstract
Researchers using samples of trades identified as retail by the Boehmer et al. (2021) methodology implicitly assume their data are representative of actual retail trades. Given the work of Barber et al. (2023), who find the algorithm only identifies 35% of trades they place with retail brokers, one must assume the sample of retail trades that is not classified as retail by the algorithm has the same characteristics as the sample of retail trades that are identified as retail. Moreover, researchers must either assume the algorithm does not falsely identify institutional trades to be retail or that institutional trades have the same characteristics as retail trades. We demonstrate that neither of these assumptions are valid in practice. Institutional order flow routinely trades on subpenny increments that are identified as retail. Furthermore, the subset of retail trades identified by the algorithm to be retail differs significantly along many dimensions from the subset of retail trades that are not classified as retail so that the stratified random samples of securities used by researchers do not accurately represent actual retail trading.
Keywords: Retail trades
JEL Classification: G20
Suggested Citation: Suggested Citation