Enhancing Legal Retrieval System via New Legal Sentence-Transformer Embeddings Model

Posted: 5 Feb 2024 Last revised: 27 Feb 2024

Date Written: September 21, 2023

Abstract

This project presents a state-of-the-art approach to improve the Lexis Plus Question and Answering (QA) System and embedding search, which is used in multiple features, complex applications dealing with a diverse corpus across multiple international jurisdictions.

In a significant shift, the proposed model may substitute the current 512-dimensional Legal BERT with the Legal 384-dimensional all-MiniLM-L12-v2 Sentence-Transformer Embeddings model fine-tuned on a diverse selection of legal content, including US Case law, motions, Canadian Case law, statutes, UK Case law, etc. Despite the reduction in dimensionality, preliminary results indicate improved performance, demonstrating the efficiency of the Legal all-MiniLM-L12-v2 model.

Methods discussed for this evolution include a novel data augmentation approach: the random cropping technique, the incorporation of the contrastive loss function during training, and regular monitoring of performance metrics on both the training and validation sets allowed early detection and prevention of overfitting.

Keywords: Question Answering, Random Cropping, Contrastive Loss, model overfitting

Suggested Citation

Xie, Wenwen and Khazaeli, Soha and Mathai, Shyjee, Enhancing Legal Retrieval System via New Legal Sentence-Transformer Embeddings Model (September 21, 2023). Proceedings of the 7th Annual RELX Search Summit, Available at SSRN: https://ssrn.com/abstract=4716474

Wenwen Xie (Contact Author)

LexisNexis ( email )

P. O. Box 933
Dayton, OH 45401
United States

Soha Khazaeli

LexisNexis ( email )

P. O. Box 933
Dayton, OH 45401
United States

Shyjee Mathai

LexisNexis ( email )

P. O. Box 933
Dayton, OH 45401
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
206
PlumX Metrics