16 Pages Posted: 8 Jul 2013
Date Written: July 7, 2013
The present paper introduces a technique to deal with coporate names heterogeneities in the context of public procurement metadata. Public bodies are currently facing a big challenge trying to improve both the performance and the transparency of administrative processes. The e-Government and Open Linked Data initiatives have emerged as efforts to tackle existing interoperability and integration issues among ICT-based systems but the creation of a real transparent environment requires much more than the simple publication of data and information in specific open formats; data and information quality is the next major step in the pubic sector. More specifically in the e-Procurement domain there is a vast amount of valuable metadata that is already available via the Internet protocols and formats and can be used for the creation of new added-value services. Nevertheless the simple extraction of statistics or creation of reports can imply extra tasks with regards to clean, prepare and reconcile data. On the other hand, transparency has become a major objective in public administractions and, in the case of public procurement, one of the most interesting services lies in tracking rewarded contracts (mainly type, location, and supplier). Although it seems a basic kind of reporting service the truth is that its generation can turn into a complex task due to a lack of standardization in supplier names or the use of different descriptors for the type of contract. In this paper, a stepwise method based on natural language processing and semantics to address the unfication of corporate names is defined and implemented. Moreover a research study to evaluate the precision and recall of the proposed technique, using as use case the public dataset of rewarded public contracts in Australia during the period 2004-2012, is also presented. Finally some discussion, conclusions and future work are also outlined.
Keywords: e-Procurement, business registries, Open Government Data, Australian company data, NLP
JEL Classification: L86, H50, H72, H83, M14
Suggested Citation: Suggested Citation
Rodríguez, Jose María Álvarez and Ordoñez de Pablos, Patricia and Vafopoulos, Michalis N. and Labra, José Emilio, Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique (July 7, 2013). Available at SSRN: https://ssrn.com/abstract=2290824 or http://dx.doi.org/10.2139/ssrn.2290824