Approaches for Semantic Relatedness Computation for Big Data

7 Pages Posted: 11 Apr 2019

See all articles by Rafeeq Ahmad

Rafeeq Ahmad

Jamia Millia Islamia

Tanvir Ahmad

Jamia Millia Islamia (Central University)

B.L. Pal

Kamla Nehru Institute of Technology

Sunil Malviya

Kamla Nehru Institute of Technology

Date Written: February 8, 2019

Abstract

Computation of semantic relatedness between words, concepts, sentences or documents is a fundamental task in many areas like Text Mining, Information Retrieval and so on but as we know that data is increasing day by day Leading to Big Data. Traditional softwares which were much efficient and heavily used are not able to handle Big Data i.e. they cannot give result in reasonable amount of time for Big Data. So Hadoop and Spark like tools have been developed to handle Big Data. Similarly our different algorithms and formulas, being efficient on small data, needs to be updated and tuned according to big data. Thus in text mining semantic mining leads to the extraction of semantic vector or context vector and using different formula like Mutual Information (MI), Balanced Mutual Information (BMI), Normalized Google Distance(NGD), etc. techniques to calculate fuzzy membership values of elements of vector to its corresponding concepts. Thus Semantic Similarity is calculated between two concepts by considering repeatedly the semantic similarity or information gain between two elements of each vector. Then Semantic similarity between two concepts is calculated. Thus we have a matrix similar to Vector Space Model (VSM) but these contains fuzzy membership values rather than frequency of terms. We have proposed Map Reduce based Semantic Relatedness computation technique for Big Data.

Keywords: Semantic Mining, Semantic Relatedness, Map Reduce, Big Data, Text Mining

Suggested Citation

Ahmad, Rafeeq and Ahmad, Tanvir and Pal, B.L. and Malviya, Sunil, Approaches for Semantic Relatedness Computation for Big Data (February 8, 2019). Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE) 2019. Available at SSRN: https://ssrn.com/abstract=3349564 or http://dx.doi.org/10.2139/ssrn.3349564

Rafeeq Ahmad (Contact Author)

Jamia Millia Islamia ( email )

New Delhi 110025
India

Tanvir Ahmad

Jamia Millia Islamia (Central University) ( email )

New Delhi 110025
India

B.L. Pal

Kamla Nehru Institute of Technology ( email )

SULTANPUR
UTTAR PRADESH
SULTANPUR
India

Sunil Malviya

Kamla Nehru Institute of Technology ( email )

SULTANPUR
UTTAR PRADESH
SULTANPUR
India

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
23
Abstract Views
191
PlumX Metrics