A Book on Mathematical and Statistical Procedures for Bangla Optimized Intelligence Bot
Noakhali Science and Technology University
81 Pages Posted: 2 Apr 2020 Last revised: 16 Apr 2020
Date Written: June 9, 2019
Abstract
The Bengali Informative Intelligence Bot (BIIB) is an effective Machine Learning (ML) technique that helps a user to trace relevant information by Bengali Natural Language Processing (BNLP). In this book, we introduce two mathematical and statistical procedures for BIIB based on information of Noakhali Science and Technology University (NSTU) that is significant mathematically and statistically. In the preprocessing part, this book is demonstrated by two algorithms for finding out the lemmatization of Bengali words such as Trie and Dictionary Based Search by Removing Affix (DBSRA) as well as compared with Edit Distance for the exact lemmatization. We present the Bengali Anaphora Resolution system using the Hobbs’ algorithm to get the correct expression of consequence questions. In order to reduce the time complexity of searching questions and reply from inserted information, we have used Non-negative Matrix Factorization (NMF) as the topic modeling technique, and the Singular Value Decomposition (SVD) as to reduce the dimension of questions. TF-IDF (Term Frequency-Inverse Document Frequency) has been used to convert character and/or string terms into numerical values, and to find their sentiments. For the action of chatbot in replying questions, we have applied the TF-IDF, cosine similarity and Jaccard similarity to find out the accurate answer from the documents. In this study, we introduce a Bengali Language Toolkit (BLTK) and Bengali Language Expression (BRE) that make the easiest implementation of our task. We have also developed Bengali root word’s corpus, synonym word’s corpus, stop word’s corpus, and collected 74 topic related questions and answers from the information of NSTU which are actually our inserted informative questions. For verifying our proposed systems, we have created 2852 questions from the introduced topics. We have got 96.22% accurate answer by using cosine similarity and 84.64% by Jaccard similarity in our proposed BIIB.
Keywords: Bangla Chatbot, Bangla NLP, BLTK, Mathematical & Statistical Process
Suggested Citation: Suggested Citation