Towards a Novel Weakly Supervised Joint Approach of Named Entity Recognition and Normalization for Noisy Text
5 Pages Posted: 21 May 2018 Last revised: 19 Jun 2018
Date Written: May 15, 2018
The application of Natural Language Processing (NLP) tasks to the attractive social media corpus is very challenging because social media users often prefer communicating with casual language using out- of-vocabulary (OOV) words and internet abbreviations (Slang). That's why, we have to boost the performance of NLP tasks when applied to social media text. So, we are interested in improving the very major fundamental NLP task, Named Entity Recognition (NER), which assign to each entity a label whether it's a (person, location, organization, etc.) from Twitter. NER will be improved by converting non-standard entities to their canonical form called the Named Entity Normalization (NEN). In this paper, we propose a novel weakly supervised joint approach for named entity recognition and normalization for noisy text. We jointly conduct weakly supervised NER and normalization of both single-token OOV words and multitoken Slang to recognize and restore any type of named entities to their canonical form. This approach can give better results than existing state-of-art NER systems, NEN systems and pipe line approaches.
Keywords: Lexical Normalization, NLP, Weak Supervision, NER
Suggested Citation: Suggested Citation