Genesys: A Deep Learning Approach for High-recall Multi-label Legal Topical Classification
Posted: 29 Jan 2021
Date Written: November 6, 2020
Abstract
Caselaw documents on LexisNexis contain headnotes identifying key points of law discussed in cases. Currently, legal editors manually assign legal topics from the US legal taxonomy to each headnote. The assigned topics are useful for improving search results, filtering, document recommendations, and many other applications. Even though legal topics have many applications, manual assignment of topics is expensive and time-consuming. To address this issue, we present a novel method for automating the application of legal topics to caselaw headnotes. The system we have developed uses a deep learning-based classification approach to predict multiple legal topics for each headnote. The current distribution of legal topics in headnotes is very unbalanced where a few topics account for most of the labels and a large number of topics are rarely applied. To address this lack of coverage for rare topics, we have built a separate model for each topic. These models are built with glove embeddings and convolutional neural networks. Given the vast number of topics in the legal taxonomy, we have developed methods for sharing embeddings across the models and compressing embeddings for efficient matching of all the models at inference time. We believe that our approach will allow the dual benefits of a) Enabling classification of any content including documents, RFCs, passages in other content types, and b) Equipping LexisNexis to fully leverage the power of legal topics for significantly improving search results on Lexis Advance.
Suggested Citation: Suggested Citation