Transformer-Based Model for Multispecies Acute Toxicity Prediction and Environmental Risk Classification
27 Pages Posted: 17 Feb 2025
Abstract
Traditional environmental hazard assessments rely on empirical toxicity data from multiple species. However, determining compound toxicity experimentally is time-consuming and costly, resulting in significant data gaps for key species classifications. Although machine learning (ML) methods provide computational alternatives, they primarily focus on single-species predictions. Their limited applicability to multispecies environmental risk profiles is a major limitation. To address this, we present the Acute Toxicity Prediction and Risk Assessment (ATPRA) framework, integrating multispecies acute toxicity regression prediction task, environmental risk classification task, and interspecies toxicity extrapolation. The regression model using a Transformer architecture. Evaluation showed high predictive accuracy across avian, fish, insect, and mammalian species (R²train = 0.68–0.91, R²text = 0.59–0.91). The acute toxicity classification model, aligned with the Globally Harmonized System (GHS), achieved a test set accuracy of 0.92. Additionally, the model was validated using a real external validation set, achieving an accuracy of 71.4%, demonstrating its generalization ability. Interspecies extrapolation models between Bobwhite Quail and Mallard Duck demonstrated cross-prediction capabilities (R²test = 0.64 for quail-to-duck prediction; R²test = 0.64 for duck-to-quail prediction), providing a cost-effective strategy to address data scarcity. The ATPRA framework is available as a web server (http://www.envwind.site/tools.html) to support practical environmental risk assessments.
Keywords: Acute toxicity, Environmental risk assessment, Interspecies relationship, Machine learning, Transformer
Suggested Citation: Suggested Citation