Finding Doppelgängers in Scopus: How to Build Scientists Control Groups Using Sosia
21 Pages Posted: 8 Dec 2020 Last revised: 27 Nov 2024
Date Written: December 3, 2020
Abstract
The construction of control groups of scientists is often a daunting effort. This paper presents sosia, an open-source Python-based software designed to query efficiently the Scopus database via RESTful API. sosia searches for researchers with publication profiles similar to a given researcher up to a given year based on all main standard bibliometric indicators. The user can choose flexibly a set of parameters to restrict the search to more or less narrow boundaries upfront and obtain additional similarity indicators to select a subset of authors after the search. Advanced settings also allow to narrow the search to a list of affiliations and to minimize the possible errors arising from ambiguous author profiles. One basic search can be set up in a few command lines and the average time of computation goes between 60 and 300 minutes. We discuss the functioning, characteristics, limitations and possible extension of the software.
Keywords: Statistical Software, Control Group, Diff-in-Diff, Scopus
JEL Classification: C00, A14
Suggested Citation: Suggested Citation