Document Representations to Improve Topic Modelling

8 Pages Posted: 25 Nov 2020

See all articles by Venkata Poojitha P

Venkata Poojitha P

Amrita Vishwa Vidyapeetham Kollam

Remya R.K. Menon

Amrita Vishwa Vidyapeetham Amritapuri,Kollam

Date Written: October 29, 2020


Each and every day, huge amount of information are collected from web applications. So it is difficult to understand or detect what the whole information is all about. To detect, understand and summarise the whole information, it requires some specific tools and techniques like topic modelling, which helps to analyze and identify the crisp of the data. This paper implements the sparsity based document representation to improve Topic Modeling, it organizes the data with meaningful structure by using machine learning algorithms like LDA (Latent Dirichlet Allocation) and OMP (Orthogonal Matching Pursuit) algorithms. It identifies a documents belongs to which topic as well as similarity between documents in an existing dictionary. The OMP(Orthogonal Matching Pursuit) algorithm is the best algorithm for sparse approximation With better accuracy. OMP(Orthogonal Matching Pursuit) algorithm can identify the topics to which the input document[Y] is mostly related to across a large collection of text documents present in a dictionary.

Keywords: topic modeling; sparse matrix; dictionary representation; Latent Dirichlet Allocation; Orthogonal Matching Pursuit; clustering; TF-IDF; document detection

Suggested Citation

P, Venkata Poojitha and Menon, Remya R.K., Document Representations to Improve Topic Modelling (October 29, 2020). Proceedings of the 2nd International Conference on IoT, Social, Mobile, Analytics & Cloud in Computational Vision & Bio-Engineering (ISMAC-CVB 2020), Available at SSRN: or

Venkata Poojitha P (Contact Author)

Amrita Vishwa Vidyapeetham Kollam ( email )

Remya R.K. Menon

Amrita Vishwa Vidyapeetham Amritapuri,Kollam ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics