Lee Si Hoon, Choi Eunwoo, Park JunHo, Yoon Seohwi, Song Myung-Ha, Lee Ji Young, Seo Jungkwan, Shin Sun Kyung, Lee Sang Hee, Oh Han Bin
Department of Chemistry, School of Natural Sciences, Sogang University, Seoul, 04107, Republic of Korea.
Environmental Risk Research Division, Environmental Health Research Department, National Institute of Environmental Research, Incheon, 22689, Republic of Korea.
Sci Rep. 2025 May 25;15(1):18186. doi: 10.1038/s41598-025-02590-y.
Due to the diverse molecular structures of chemical compounds and their intricate biological pathways of toxicity, predicting their reproductive and developmental toxicity remains a challenge. Traditional Quantitative Structure-Activity Relationship models that rely on molecular descriptors have limitations in capturing the complexity of reproductive and developmental toxicity to achieve high predictive performance. In this study, we developed a descriptor-free deep learning model by constructing a Graph Convolutional Network designed with multi-head attention and gated skip-connections to predict reproductive and developmental toxicity. By integrating structural alerts directly related to toxicity into the model, we enabled more effective learning of toxicologically relevant substructures. We built a dataset of 4,514 diverse compounds, including both organic and inorganic substances. The model was trained and validated using stratified 5-fold cross-validation. It demonstrated excellent predictive performance, achieving an accuracy of 81.19% on the test set. To address the interpretability of the deep learning model, we identified subgraphs corresponding to known structural alerts, providing insights into the model's decision-making process. This study was conducted in accordance with the OECD principles for reliable Quantitative Structure-Activity Relationship modeling and contributes to the development of robust in silico models for toxicity prediction.
由于化合物的分子结构多样,其毒性的生物途径错综复杂,预测它们的生殖和发育毒性仍然是一项挑战。依赖分子描述符的传统定量构效关系模型在捕捉生殖和发育毒性的复杂性以实现高预测性能方面存在局限性。在本研究中,我们通过构建一个设计有多头注意力和门控跳跃连接的图卷积网络,开发了一个无描述符的深度学习模型来预测生殖和发育毒性。通过将与毒性直接相关的结构警报整合到模型中,我们能够更有效地学习毒理学相关的子结构。我们建立了一个包含4514种不同化合物的数据集,包括有机和无机物质。该模型使用分层5折交叉验证进行训练和验证。它表现出优异的预测性能,在测试集上的准确率达到81.19%。为了解决深度学习模型的可解释性问题,我们确定了与已知结构警报相对应的子图,从而深入了解模型的决策过程。本研究是根据经合组织可靠的定量构效关系建模原则进行的,为开发强大的毒性预测计算机模型做出了贡献。