Beyond the Experiment: The Extendable Legal Link Extractor

Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts, held in conjunction with the 2015 International Conference on Artificial Intelligence and Law (ICAIL), June 08 - 12, 2015, San Diego, CA, USA.

9 Pages Posted: 13 Jul 2015

See all articles by Marc van Opijnen

Marc van Opijnen

Publications Office of the Netherlands

Nico Verwer

Rakensi

Jan Meijer

Importalis

Date Written: June 12, 2015

Abstract

In this paper we describe a software framework for detecting and resolving references to (national and EU) legislation, case law, parliamentary documents and official gazettes. Meant to function in a large-scale production environment, performance, flexibility and maintainability are essential requirements. This led us to some noteworthy choices: within the pipeline architecture of Apache Cocoon we use the trie data structure for named entity recognition and a parsing expression grammar for pattern recognition, the latter having significant advantages over the use of regular expressions. Additional attention is paid to some substantive maintainability issues.

Keywords: Legal semantic web, Natural Language Processing, Parsing expression grammar, Pipeline processing

Suggested Citation

van Opijnen, Marc and Verwer, Nico and Meijer, Jan, Beyond the Experiment: The Extendable Legal Link Extractor (June 12, 2015). Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts, held in conjunction with the 2015 International Conference on Artificial Intelligence and Law (ICAIL), June 08 - 12, 2015, San Diego, CA, USA., Available at SSRN: https://ssrn.com/abstract=2626521

Marc Van Opijnen (Contact Author)

Publications Office of the Netherlands ( email )

Wilhelmina van Pruisenweg 52
The Hague, 2595 AN
Netherlands

Nico Verwer

Rakensi ( email )

Netherlands

Jan Meijer

Importalis ( email )

Netherlands

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
157
Abstract Views
1,195
rank
222,855
PlumX Metrics