How Much Does it Cost? Optimization of Costs in Sequence Analysis of Social Science Data
Sociological Methods and Research, Forthcoming
Posted: 23 Oct 2008 Last revised: 26 Oct 2008
Date Written: October 12, 2007
Abstract
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. If this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems solved in bioinformatics for three decades. In this article, we propose an optimization of substitution and deletion/insertion costs based on computational methods. We provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct datasets we tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. We found that the proposed method performs well compared with other cost setting strategies, while it alleviates the justification problem of cost schemes.
Keywords: sequence analysis, optimal matching, trajectories, empirical cost optimization
Suggested Citation: Suggested Citation