Predicting Language Outcome after Stroke Using Machine Learning: In Search of the Big Data Benefit
29 Pages Posted: 21 Apr 2025 Publication Status: Under Review
Abstract
Accurate prediction of post-stroke language outcomes using machine learning offers the potential to enhance clinical treatment and rehabilitation for aphasic patients. This study of 758 English speaking stroke patients from the PLORAS project explores the impact of sample size on the performance of logistic regression and a deep learning (ResNet-18) model in predicting language outcomes from neuroimaging and impairment-relevant tabular data. We assessed the performance of both models on two key language tasks from the Comprehensive Aphasia Test: Spoken Picture Description and Naming, using a learning curve approach. Contrary to expectations, the simpler logistic regression model performed comparably or better than the deep learning model (with overlapping confidence intervals), with both models showing an accuracy plateau around 80% for sample sizes larger than 300 patients. Principal Component Analysis revealed that the dimensionality of the neuroimaging data could be reduced to as few as 20 (or even 2) dominant components without significant loss in accuracy, suggesting that classification may be driven by simple patterns such as lesion size. The study highlights both the potential limitations of current dataset size in achieving further accuracy gains and the need for larger datasets to capture more complex patterns, as some of our results indicate that we might not have reached an absolute classification performance ceiling. Overall, these findings provide insights into the practical use of machine learning for predicting aphasia outcomes and the potential benefits of much larger datasets in enhancing model performance.
Note:
Funding Information: Data acquisition was funded by the Wellcome [203147/Z/16/Z; 205103/Z/16/Z; 224562/Z/21/Z], the Medical Research Council [MR/M023672/1] and the Stroke Association [TSA 2014/02]. Margarita Saranti is supported by a Stroke Association Doctoral Fellowship (SA PGF22\100013).
Conflict of Interests: The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.
Keywords: Machine-learning, Aphasia, Big-data, MRI scans, Dimensionality reduction, Learning curves
Suggested Citation: Suggested Citation