Learning Word Embeddings from 10-K Filings Using PyTorch

9 Pages Posted: 14 Nov 2019

See all articles by Saurabh Sehrawat

Saurabh Sehrawat

Stony Brook University - Department of Applied Mathematics & Statistics

Date Written: September 5, 2019


With the rise of alternative data in finding trading signals, Natural Language Processing (NLP) on financial documents has gained significant importance in the recent years. Word Embeddings learned from text corpus are one of the most important inputs to various NLP models, especially Deep Learning based models. In this paper, we generate word embeddings learned from corpus of 10-K filings by corporates in U.S. to S.E.C from 1993 to 2018 using word2vec model implemented in PyTorch [5]. Word Embeddings learned from general corpus of articles from Google News, Wikipedia etc are readily available online for researchers to use in their models but embeddings learned from 10-K filings are not publicly available. Using word embeddings learned from general text for NLP tasks on financial documents may not yield accurate results as it has been proven that word embeddings learned from contextual text yields better and more accurate results compared to general word embeddings. We aim to publish the word embeddings learned from 10-K filings online so that they can be used by other researchers in their NLP tasks such as document classification, document similarity, sentiment analysis, readability index etc. on 10-K filings or other financial documents.

Keywords: 10-K, Word Embeddings, Word2Vec, Skip-Gram, Natural Language Processing (NLP), Machine Learning, Deep Learning, Neural Networks, PyTorch, t-SNE, Cosine Similarity, Amazon AWS, Quantitative Finance, Alternative Data, Trading Signals

JEL Classification: G1, G2, C45

Suggested Citation

Sehrawat, Saurabh, Learning Word Embeddings from 10-K Filings Using PyTorch (September 5, 2019). Available at SSRN: https://ssrn.com/abstract=3480902 or http://dx.doi.org/10.2139/ssrn.3480902

Saurabh Sehrawat (Contact Author)

Stony Brook University - Department of Applied Mathematics & Statistics ( email )

Stony Brook University
Stony Brook, NY 11794
United States

Register to save articles to
your library


Paper statistics

Abstract Views
PlumX Metrics