Topic Modelling of Legal Documents via LEGAL-BERT
Topic Modelling of Legal Documents via LEGAL-BERT, 2021, São Paulo. Proceedings of the First International Workshop RELATED - Relations in the Legal Domain 2021
9 Pages Posted: 24 Aug 2023
Date Written: June 25, 2021
Abstract
Legal text processing is a challenging task for modeling approaches due to the peculiarities inherent to its features, such as long texts and their technical vocabulary. Topic modeling consists of discovering a semantic structure in the text. This way, it requires specific approaches. The relevant topics strongly depend on the context in which the legal documents will be presented. This work aims to describe and evaluate the use of BERTopic for topic modeling in legal documents. The authors have focused on a subset of landmark cases from the US Caselaw dataset to evaluate the impact of topic modeling, via domain-specific embeddings pre-trained from LEGAL-BERT. The research investigated different variations of generating sentence embeddings from the cases. Results here presented demonstrate that considering the references to statutory law (e.g. US Code) during the process of text embeddings improves the quality of topic modeling.
Keywords: Natural Language Processing (NLP), American Case Law, Contextualized Embeddings
JEL Classification: K10, C00
Suggested Citation: Suggested Citation