Using the EDGAR Log File Data Set

36 Pages Posted: 8 Feb 2017 Last revised: 17 Feb 2017

Date Written: February 8, 2017

Abstract

The SEC's EDGAR log file data set is a collection of web server log files that allow researchers to study the demand for SEC filings. This multiple terabyte data set provides researchers with a direct measure of demand for financial reports, but the log files must be filtered to remove downloads by computer programs (or robots), and the sheer size of the files presents big data challenges. This paper compares three methods for counting human views in the EDGAR log files and aggregates the data on a filing-day basis so that it is accessible to desktop hardware and statistical analysis software. Overall, the three methods agree on the robot-human classification for 96 percent of users, but for sample 10-K filings, they can disagree by up to 27 percent. Download counts may be biased by up to 36 percent if multiple views by the same user are counted. Ryans's 2017 method eliminates multiple download counting and appears to effectively classify robots in cases of disagreement among the measures. The choice of measure may be particularly important when studying demand for Forms 10-K, 10-Q, 4, 13F-HR, as well as SEC comment letters. The aggregated data and sample code are available from the author.

Keywords: EDGAR downloads, SEC filings, demand for financial information, investor attention, big data

JEL Classification: M41, G14, G18

Suggested Citation

Ryans, James, Using the EDGAR Log File Data Set (February 8, 2017). Available at SSRN: https://ssrn.com/abstract=2913612 or http://dx.doi.org/10.2139/ssrn.2913612

James Ryans (Contact Author)

London Business School ( email )

Sussex Place
Regent's Park
London, NW1 4SA
United Kingdom

HOME PAGE: http://www.london.edu

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
965
Abstract Views
3,376
rank
24,237
PlumX Metrics