Token Classification Tasks Model Comparison and Performance Improvement Strategy with Natural Language Query
Posted: 16 Feb 2023
Date Written: September 21, 2023
Abstract
To automatically discover user’s queries key entities and attach the concepts with different search field saves user’s query generation time and benefit future entity synonyms suggestion. After the releasing of Bert family models, fine tuning with language model became a big success. Our solution for key entity extraction from user’s queries and conversion to syntax queries is by leveraging token classification. The keywords in a natural language query (NLQ) can be assigned to different classes and then translated to syntax queries. User’s queries have the features that 1. grammar may not be correct, 2. characters can be case incorrect. To solve the Feature 1 and Feature 2 of user’s queries, we compared solutions: LUKE (Yamada et al., 2020), BERT based at fine tuning stage, BERT+CRF on CONLL2003 datasets. F1 score take both precision and recall into account. The higher precision and recall the higher F1 score. The BERT based at fine tuning stage was selected with F1 score 0.96. With a high-performance language model, there are two problems that we solved. To extract the key entity part of a query, and to improve the accuracy of certain class with dictionary. The key entity concept extraction, which the F1 score is 0.97. The POS with noun words was utilized when tagging the data. The concepts extraction technology can also be utilized on phrase mining, synonym suggestion and knowledge graph generation. For performance improvement, the well-defined dictionary is always a good data resource. Spacy Sweak, weak labels and Luke strategy are all tested in this study. The weak label can improve F1 score of organization from 0.73 to 0.89.
Keywords: Search Algorithms
Suggested Citation: Suggested Citation