Improved Performance of Nanotoxicity Prediction Models Using Automated Machine Learning
22 Pages Posted: 17 Jan 2022
Abstract
Computational modeling, particularly with machine learning models, has been of significant interest for non-animal testing of nanotoxicity. Machine learning algorithms find a relationship between the endpoint and descriptors through mathematical functions. However, the tuning of all parameters of the algorithms requires time, expertise, and an intensive search for producing optimized predictive models. Current approaches for optimizing machine learning algorithms still require sufficient computing power (e.g., graphical processing units and multiple-cores central processing units). The development of an automated machine learning (autoML) approach and publicly available platforms (e.g., Google Vertex AI, Microsoft Azure, and Dataiku) have shown benefits to the users who have little machine learning knowledge by applying automatic data preprocessing, algorithms, and hyperparameter selection to produce models via various combinations. In this study, we used autoML to develop predictive models for the cellular toxicity of metal and oxide nanoparticles and benchmarked autoML and machine learning (ML) models. Our results demonstrated that autoML produced higher-performance models than the ML approach. Models from three autoML platforms provided satisfactory performance, and no platform outperformed the others. Models built from datasets with a higher data quality (measured by using physicochemical scores) showed better performance. The size of datasets showed effects on the performance of autoML models, but those effects resulted from a relationship between the data quality and model performance.
Keywords: nanotoxicity modeling, automated machine learning, Oxide, metal, data quality
Suggested Citation: Suggested Citation