Approaches for Semantic Relatedness Computation for Big Data
7 Pages Posted: 11 Apr 2019
Date Written: February 8, 2019
Abstract
Computation of semantic relatedness between words, concepts, sentences or documents is a fundamental task in many areas like Text Mining, Information Retrieval and so on but as we know that data is increasing day by day Leading to Big Data. Traditional softwares which were much efficient and heavily used are not able to handle Big Data i.e. they cannot give result in reasonable amount of time for Big Data. So Hadoop and Spark like tools have been developed to handle Big Data. Similarly our different algorithms and formulas, being efficient on small data, needs to be updated and tuned according to big data. Thus in text mining semantic mining leads to the extraction of semantic vector or context vector and using different formula like Mutual Information (MI), Balanced Mutual Information (BMI), Normalized Google Distance(NGD), etc. techniques to calculate fuzzy membership values of elements of vector to its corresponding concepts. Thus Semantic Similarity is calculated between two concepts by considering repeatedly the semantic similarity or information gain between two elements of each vector. Then Semantic similarity between two concepts is calculated. Thus we have a matrix similar to Vector Space Model (VSM) but these contains fuzzy membership values rather than frequency of terms. We have proposed Map Reduce based Semantic Relatedness computation technique for Big Data.
Keywords: Semantic Mining, Semantic Relatedness, Map Reduce, Big Data, Text Mining
Suggested Citation: Suggested Citation