LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

17 Pages Posted: 5 Oct 2021 Last revised: 15 Oct 2021

See all articles by Ilias Chalkidis

Ilias Chalkidis

Athens University of Economics and Business; University of Copenhagen

Abhik Jana

Language Technology Group, Department of Informatics, Universität Hamburg; University of Hamburg

Dirk Hartung

Bucerius Law School - Center for Legal Technology and Data Science; Stanford University - Stanford Codex Center

Michael James Bommarito

Bommarito Consulting, LLC; Licensio, LLC; Stanford Center for Legal Informatics; Michigan State College of Law

Ion Androutsopoulos

Athens University of Economics and Business

Daniel Martin Katz

Illinois Tech - Chicago Kent College of Law; Stanford CodeX - The Center for Legal Informatics; Bucerius Center for Legal Technology & Data Science

Nikolaos Aletras

University of Sheffield

Date Written: October 13, 2021

Abstract

Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across various tasks in the legal domain. To answer this currently open question, we introduce the Legal General Language Understanding Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way. We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.

Keywords: natural language processing, legal data, legal tech, natural language understanding, evaluation, machine learning, artificial intelligence, artificial intelligence and law

JEL Classification: C88, C80, K00, K40, M49

Suggested Citation

Chalkidis, Ilias and Chalkidis, Ilias and Jana, Abhik and Jana, Abhik and Hartung, Dirk and Bommarito, Michael James and Androutsopoulos, Ion and Katz, Daniel Martin and Aletras, Nikolaos, LexGLUE: A Benchmark Dataset for Legal Language Understanding in English (October 13, 2021). Available at SSRN: https://ssrn.com/abstract=3936759 or http://dx.doi.org/10.2139/ssrn.3936759

Ilias Chalkidis

Athens University of Economics and Business ( email )

76 Patission Street
Athens, 104 34
Greece

University of Copenhagen ( email )

Nørregade 10
Copenhagen, København DK-1165
Denmark

Abhik Jana

University of Hamburg ( email )

Hamburg

Language Technology Group, Department of Informatics, Universität Hamburg ( email )

Allende-Platz 1
Hamburg, 20146
Germany

Dirk Hartung

Bucerius Law School - Center for Legal Technology and Data Science ( email )

Jungiusstr. 6
Hamburg, 20355
Germany

Stanford University - Stanford Codex Center ( email )

559 Nathan Abbott Way
Stanford, CA 94305-8610
United States

HOME PAGE: http://https://law.stanford.edu/directory/dirk-hartung/

Michael James Bommarito

Bommarito Consulting, LLC ( email )

MI 48098
United States

HOME PAGE: http://bommaritollc.com

Licensio, LLC ( email )

Okemos, MI 48864
United States

Stanford Center for Legal Informatics ( email )

559 Nathan Abbott Way
Stanford, CA 94305-8610
United States

Michigan State College of Law ( email )

318 Law College Building
East Lansing, MI 48824-1300
United States

Ion Androutsopoulos

Athens University of Economics and Business ( email )

76 Patission Street
Athens, 104 34
Greece

Daniel Martin Katz (Contact Author)

Illinois Tech - Chicago Kent College of Law ( email )

565 W. Adams St.
Chicago, IL 60661-3691
United States

HOME PAGE: http://www.danielmartinkatz.com/

Stanford CodeX - The Center for Legal Informatics ( email )

559 Nathan Abbott Way
Stanford, CA 94305-8610
United States

HOME PAGE: http://law.stanford.edu/directory/daniel-katz/

Bucerius Center for Legal Technology & Data Science ( email )

Jungiusstr. 6
Hamburg, 20355
Germany

HOME PAGE: http://legaltechcenter.de/

Nikolaos Aletras

University of Sheffield ( email )

17 Mappin Street
Sheffield, Sheffield S1 4DT
United Kingdom

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
205
Abstract Views
584
rank
188,880
PlumX Metrics