An Iterative Model for Text Mining Using Big Data Technology

8 Pages Posted: 14 Jun 2019

Date Written: March 20, 2019

Abstract

Big Data is a term used to refer an enormous volume of structured, unstructured data or combination of both which is so large that it becomes very difficult and complex to process them in a relational database management system or in a legacy software system. In enterprise layout the volume of data is too large set and/or it moves too rapidly and/or it exceeds present processing capacity. The figuring of Big Data begins with the raw data that isn’t clustered and is most usually impossible to cache in the memory of a single computer. Storing and retrieving the data is the most tedious challenge in Big Data. Some application domains of big data which are Banking, Health care, Text Mining, Education etc. When data is in text format then manual grouping of data has significant complications. Therefore there is an intense obligation of document clustering for appropriate grouping of text articles so that the right sentiment of the writer can be revealed. It involves algorithms of data mining, machine learning, statistics, and natural language processing, attempts to extract high quality of useful information from textual data. In this paper we have proposed a text mining model which will make the large volume text mining process easy as well proficient. Clustering and big data technologies are the real work horses of the model. It will eliminate the irrelevant context (Stop Words) and represent the document in a quantitative form. The accuracy of the model is increased by following an iterative approach of analysis. Eventually the name of model is appealed as “Advanced Text Mining Model”.

Keywords: Big Data, Hadoop Map Reduce Text Mining Document Clustering K-Mean Clustering

Suggested Citation

Khatai, Swagat and Rautaray, Siddharth Swarup and Sahoo, Swetaleena and Pandey, Manjusha, An Iterative Model for Text Mining Using Big Data Technology (March 20, 2019). Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur - India, February 26-28, 2019, Available at SSRN: https://ssrn.com/abstract=3356552 or http://dx.doi.org/10.2139/ssrn.3356552

Swagat Khatai (Contact Author)

KIIT University ( email )

IN
India

Siddharth Swarup Rautaray

KIIT University ( email )

IN
India

Swetaleena Sahoo

KIIT University

IN
India

Manjusha Pandey

KIIT University

IN
India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
181
Abstract Views
901
Rank
418,025
PlumX Metrics