Troublesome Dependency Modeling: Causality, Inference, Statistical Learning
84 Pages Posted: 12 Jun 2017 Last revised: 29 Jul 2017
Date Written: June 9, 2017
The modeling of dependencies lies at the heart of statistics and data science, but there remain some profound questions: how statistical inference in probabilistic terms is linked with causality; what modern causality models offer that is substantially different from the traditional dependency models like regression; if special causality theories deliver these promises or not; how all these models are related to statistical and machine learning techniques, and others. Or, more generally: if the causal picture of the world is a commonly accepted goal of any science, could the non-causal statistical models be of any use? If yes – in what sense? If not – why are they so widely used? The insufficient level of detail in discussions of these and similar problems creates a lot of confusion, especially now, when lauded terms like Data Mining, Big Data, Deep Learning and others appear even in the non-professional media. This article inspects the underlying logic of different approaches, directly or indirectly, related with causality. It shows that even established methods are vulnerable to small deviations from the ideal setting; that the leading approaches to statistical causality (SEM, DAG and potential outcomes theories) do not provide a coherent causality theory, and argues, that this theory is impossible on pure statistical grounds. It also discusses a new approach in which the concept of causality is replaced by the concept of dependent variable generation. Separation of the variables which generate the outcome from other variables is proposed.
Keywords: dependency modeling; statistical inference; causality modeling; counterfactual statements; statistical learning; intrinsic probability; generative dependency approach
Suggested Citation: Suggested Citation