SiSOB Data Extraction and Codification: A Tool to Analyse Scientific Careers

Forthcoming in Research Policy special issue on Big Data

SPRU Working Paper Series SWPS 2014-03 (February)

47 Pages Posted: 9 Feb 2015

See all articles by Aldo Geuna

Aldo Geuna

University of Torino - Department Cultures, Politics and Society

Rodrigo Kataishi

University of Turin - Department of Economics S. Cognetti de Martiis

Manuel Toselli

University of Turin - Department of Economics S. Cognetti de Martiis

Eduardo Guzman

University of Malaga

Cornelia Lawson

University of Manchester - Alliance Manchester Business School

Ana Fernandez-Zubieta

Instituto de Estudios Superiores de Administración (IESA)

Beatriz Barros

University of Malaga

Date Written: January 15, 2015

Abstract

This paper describes the methodology and software tool used to build a database on the careers and productivity of academics, using public information available on the Internet, and provides a first analysis of the data collected for a sample of 360 US scientists funded by the National Institute of Health (NIH) and 291 UK scientists funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The tool’s structured outputs can be used for either econometric research or data representation for policy analysis. The methodology and software tool is validated for a sample of US and UK biomedical scientists, but can be applied to any countries where scientists’ CVs are available in English. We provide an overview of the motivations for constructing the database, and the data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database. We describe the database and the effectiveness of our algorithms and provide suggestions for further improvements. The software developed is released under free software GNU General Public License; the aim is for it to be available to the community of social scientists and economists interested in analysing scientific production and scientific careers, who it is hoped will develop this tool further.

Keywords: Information retrieval, Extraction and data integration, Academic careers, Research productivity, Mobility of Research Scientists

JEL Classification: C81; C88; I23; O31

Suggested Citation

Geuna, Aldo and Kataishi, Rodrigo and Toselli, Manuel and Guzman, Eduardo and Lawson, Cornelia and Fernandez-Zubieta, Ana and Barros, Beatriz, SiSOB Data Extraction and Codification: A Tool to Analyse Scientific Careers (January 15, 2015). Forthcoming in Research Policy special issue on Big Data, SPRU Working Paper Series SWPS 2014-03 (February), Available at SSRN: https://ssrn.com/abstract=2561876 or http://dx.doi.org/10.2139/ssrn.2561876

Aldo Geuna (Contact Author)

University of Torino - Department Cultures, Politics and Society ( email )

Lungo Dora Siena 100 A
Torino, 10153
Italy

Rodrigo Kataishi

University of Turin - Department of Economics S. Cognetti de Martiis ( email )

Via Po' 53
Torino, 10124
Italy

Manuel Toselli

University of Turin - Department of Economics S. Cognetti de Martiis ( email )

Via Po' 53
Torino, 10124
Italy

Eduardo Guzman

University of Malaga ( email )

Malaga, Málaga 29004
Spain

Cornelia Lawson

University of Manchester - Alliance Manchester Business School ( email )

Booth Street West
Manchester, M15 6PB
United Kingdom

Ana Fernandez-Zubieta

Instituto de Estudios Superiores de Administración (IESA) ( email )

C/ Campo Santo de los Mártires, 7
Córdoba, Córdoba 14004
Spain

Beatriz Barros

University of Malaga ( email )

Malaga, Málaga 29004
Spain

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
38
Abstract Views
2,731
PlumX Metrics