Truncating Legal Cases and its Impact on Automatic Keywording

27 Pages Posted: 22 Apr 2024

See all articles by Philipp Adämmer

Philipp Adämmer

University of Greifswald

Svenja Lull

University of the German Federal Armed Forces - Helmut Schmidt Universität

Gunter Reiner

Helmut-Schmidt-Universität/ Universität der Bundeswehr Hamburg

Date Written: December 7, 2023

Abstract

Based on a unique dataset of annotated legal court decisions by one of the most important legal databases in Germany we show that semi-automated (partially human-influenced) keywords appear relatively early for the first time in legal cases. We further show that the same phenomenon is observed when keywords are generated entirely by three unsupervised methods: Tf-Idf, TextRank and LDA. This discovery led us to hypothesise that abbreviating court decisions could improve the accuracy of automated keyword extraction. To test this hypothesis, we conducted a small-scale user test (pilot study) comparing the quality of machine-generated keywords derived from full-length court decisions with those generated from shortened texts. For Tf-Idf and TextRank, we find that the quality of the keywords is less affected by whether the documents have been truncated or not. However, automatic keywords for long documents, generated from abbreviated documents and via LDA, are on average rated as qualitatively better than keywords generated from untruncated documents. An individual analysis of keywords shows that truncation can generate both new relevant terms as well as misleading ones. We also find that semi-automated keywords are qualitatively (but not literally) similar to keywords that we generated automatically by Tf-Idf, which is the simplest of our three methods. The results are promising and deserve to be confirmed and substantiated in a larger study.

Keywords: court decisions, automatic indexing, keyword position analysis, first keyword occurrence, full text truncation

Suggested Citation

Adämmer, Philipp and Lull, Svenja and Reiner, Gunter, Truncating Legal Cases and its Impact on Automatic Keywording (December 7, 2023). Available at SSRN: https://ssrn.com/abstract=4787198 or http://dx.doi.org/10.2139/ssrn.4787198

Philipp Adämmer

University of Greifswald ( email )

Friedrich-Loeffler-Strasse 70
D-17487 Greifswald, 17489
Germany

Svenja Lull

University of the German Federal Armed Forces - Helmut Schmidt Universität

Holstenhofweg 85
Hamburg, 22008
Germany

Gunter Reiner (Contact Author)

Helmut-Schmidt-Universität/ Universität der Bundeswehr Hamburg ( email )

Holstenhofweg 85
Hamburg, 22043
Germany

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
10
Abstract Views
52
PlumX Metrics