header

Predicting Language Outcome after Stroke Using Machine Learning: In Search of the Big Data Benefit

29 Pages Posted: 21 Apr 2025 Publication Status: Under Review

See all articles by Margarita Saranti

Margarita Saranti

University of Birmingham

Douglas Neville

University College London

Adam White

University of London

Pia Rotshtein

University of Haifa

Thomas M.H. Hope

University College London

Cathy J. Price

University College London

Howard Bowman

University of Birmingham

Abstract

Accurate prediction of post-stroke language outcomes using machine learning offers the potential to enhance clinical treatment and rehabilitation for aphasic patients. This study of 758 English speaking stroke patients from the PLORAS project explores the impact of sample size on the performance of logistic regression and a deep learning (ResNet-18) model in predicting language outcomes from neuroimaging and impairment-relevant tabular data. We assessed the performance of both models on two key language tasks from the Comprehensive Aphasia Test: Spoken Picture Description and Naming, using a learning curve approach. Contrary to expectations, the simpler logistic regression model performed comparably or better than the deep learning model (with overlapping confidence intervals), with both models showing an accuracy plateau around 80% for sample sizes larger than 300 patients. Principal Component Analysis revealed that the dimensionality of the neuroimaging data could be reduced to as few as 20 (or even 2) dominant components without significant loss in accuracy, suggesting that classification may be driven by simple patterns such as lesion size. The study highlights both the potential limitations of current dataset size in achieving further accuracy gains and the need for larger datasets to capture more complex patterns, as some of our results indicate that we might not have reached an absolute classification performance ceiling. Overall, these findings provide insights into the practical use of machine learning for predicting aphasia outcomes and the potential benefits of much larger datasets in enhancing model performance.

Note:
Funding Information: Data acquisition was funded by the Wellcome [203147/Z/16/Z; 205103/Z/16/Z; 224562/Z/21/Z], the Medical Research Council [MR/M023672/1] and the Stroke Association [TSA 2014/02]. Margarita Saranti is supported by a Stroke Association Doctoral Fellowship (SA PGF22\100013).

Conflict of Interests: The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

Keywords: Machine-learning, Aphasia, Big-data, MRI scans, Dimensionality reduction, Learning curves

Suggested Citation

Saranti, Margarita and Neville, Douglas and White, Adam and Rotshtein, Pia and Hope, Thomas M.H. and Price, Cathy J. and Bowman, Howard, Predicting Language Outcome after Stroke Using Machine Learning: In Search of the Big Data Benefit. Available at SSRN: https://ssrn.com/abstract=5217627 or http://dx.doi.org/10.2139/ssrn.5217627

Margarita Saranti (Contact Author)

University of Birmingham ( email )

Edgbaston, B15 2TT
United Kingdom

Douglas Neville

University College London ( email )

Gower Street
London, WC1E 6BT
United Kingdom

Adam White

University of London ( email )

Pia Rotshtein

University of Haifa ( email )

Mount Carmel
Haifa, 31905
Israel

Thomas M.H. Hope

University College London ( email )

Gower Street
London, WC1E 6BT
United Kingdom

Cathy J. Price

University College London ( email )

Gower Street
London, WC1E 6BT
United Kingdom

Howard Bowman

University of Birmingham ( email )

Edgbaston, B15 2TT
United Kingdom

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
35
Abstract Views
152
PlumX Metrics