Enhancing Legal Retrieval System via New Legal Sentence-Transformer Embeddings Model
Posted: 5 Feb 2024 Last revised: 27 Feb 2024
Date Written: September 21, 2023
Abstract
This project presents a state-of-the-art approach to improve the Lexis Plus Question and Answering (QA) System and embedding search, which is used in multiple features, complex applications dealing with a diverse corpus across multiple international jurisdictions.
In a significant shift, the proposed model may substitute the current 512-dimensional Legal BERT with the Legal 384-dimensional all-MiniLM-L12-v2 Sentence-Transformer Embeddings model fine-tuned on a diverse selection of legal content, including US Case law, motions, Canadian Case law, statutes, UK Case law, etc. Despite the reduction in dimensionality, preliminary results indicate improved performance, demonstrating the efficiency of the Legal all-MiniLM-L12-v2 model.
Methods discussed for this evolution include a novel data augmentation approach: the random cropping technique, the incorporation of the contrastive loss function during training, and regular monitoring of performance metrics on both the training and validation sets allowed early detection and prevention of overfitting.
Keywords: Question Answering, Random Cropping, Contrastive Loss, model overfitting
Suggested Citation: Suggested Citation