Beyond the Experiment: The Extendable Legal Link Extractor
Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts, held in conjunction with the 2015 International Conference on Artificial Intelligence and Law (ICAIL), June 08 - 12, 2015, San Diego, CA, USA.
9 Pages Posted: 13 Jul 2015
Date Written: June 12, 2015
Abstract
In this paper we describe a software framework for detecting and resolving references to (national and EU) legislation, case law, parliamentary documents and official gazettes. Meant to function in a large-scale production environment, performance, flexibility and maintainability are essential requirements. This led us to some noteworthy choices: within the pipeline architecture of Apache Cocoon we use the trie data structure for named entity recognition and a parsing expression grammar for pattern recognition, the latter having significant advantages over the use of regular expressions. Additional attention is paid to some substantive maintainability issues.
Keywords: Legal semantic web, Natural Language Processing, Parsing expression grammar, Pipeline processing
Suggested Citation: Suggested Citation