Princeton University - Center for Information Technology Policy; Princeton University - Princeton School of Public and International Affairs; Princeton University - Program in Law & Public Policy; Princeton University - Department of Computer Science
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.
Keywords: legal practice, law and technology, large language models, artificial intelligence, empirical legal methods, machine learning
Guha, Neel and Nyarko, Julian and Ho, Daniel E. and Ré, Christopher and Chilton, Adam and Narayana, Aditya and Chohlas-Wood, Alex and Peters, Austin and Waldon, Brandon and Rockmore, Daniel and Zambrano, Diego and Talisman, Dmitry and Hoque, Enam and Surani, Faiz and Fagan, Frank and Sarfaty, Galit and Dickinson, Gregory M. and Porat, Haggai and Hegland, Jason and Wu, Jessica and Nudell, Joe and Niklaus, Joel and Nay, John and Choi, Jonathan H. and Tobia, Kevin and Hagan, Margaret and Ma, Megan and Livermore, Michael A. and Rasumov-Rahe, Nikon and Holzenberger, Nils and Kolt, Noam and Henderson, Peter and Rehaag, Sean and Goel, Sharad and Gao, Shang and Williams, Spencer and Gandhi, Sunny and Zur, Tom and Iyer, Varun and Li, Zehua, Legalbench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models (September 26, 2023). 2023 Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, Osgoode Legal Studies Research Paper No. 4583531, Available at SSRN: https://ssrn.com/abstract=4583531 or http://dx.doi.org/10.2139/ssrn.4583531
Subscribe to this free journal for more curated articles on this topic
FOLLOWERS
577
PAPERS
301
This Journal is curated by:
Dan Jackson at Northeastern University - NuLawLab, Jules Rochielle Sievert at Northeastern University School of Law - NuLawLab, Miso Kim at Northeastern University - NuLawLab