College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China.
Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China.
Arch Toxicol. 2024 Dec;98(12):4077-4092. doi: 10.1007/s00204-024-03866-4. Epub 2024 Sep 18.
Reproductive toxicity is one of the important issues in chemical safety. Traditional laboratory testing methods are costly and time-consuming with raised ethical issues. Only a few in silico models have been reported to predict human reproductive toxicity, but none of them make full use of the topological information of compounds. In addition, most existing atom-based graph neural network methods focus on attributing model predictions to individual nodes or edges rather than chemically meaningful fragments or substructures. In current studies, we develop a novel fragment-based graph transformer network (FGTN) approach to generate the QSAR model of human reproductive toxicity by considering internal topological structure information of compounds. In the FGTN model, the compound is represented by a graph architecture using fragments to be nodes and bonds linking two fragments to be edges. A super molecule-level node is further proposed to connect all fragment nodes by undirected edges, obtaining global molecular features from fragment embeddings. The FGTN model achieved an accuracy (ACC) of 0.861 and an area under the receiver operating characteristic curve (AUC) value of 0.914 on nonredundant blind tests, outperforming traditional fingerprint-based machine learning models and atom-based GCN model. The FGTN model can attribute toxic predictions to fragments, generating specific structural alerts for the positive compound. Moreover, FGTN may also have the capability to distinguish various chemical isomers. We believe that FGTN can be used as a reliable and effective tool for human reproductive toxicity prediction in contribution to the advancement of chemical safety assessment.
生殖毒性是化学安全的重要问题之一。传统的实验室测试方法既昂贵又耗时,且存在伦理问题。虽然已经报道了少数几种用于预测人类生殖毒性的计算模型,但它们都没有充分利用化合物的拓扑信息。此外,大多数现有的基于原子的图神经网络方法侧重于将模型预测归因于单个节点或边,而不是化学上有意义的片段或子结构。在当前的研究中,我们开发了一种新的基于片段的图变换网络(FGTN)方法,通过考虑化合物的内部拓扑结构信息来生成人类生殖毒性的 QSAR 模型。在 FGTN 模型中,化合物由使用片段作为节点、连接两个片段的键作为边的图结构表示。进一步提出了一个超级分子级别的节点,通过无向边连接所有片段节点,从片段嵌入中获取全局分子特征。FGTN 模型在非冗余盲测中达到了 0.861 的准确性(ACC)和 0.914 的接收器操作特征曲线(AUC)值,优于传统基于指纹的机器学习模型和基于原子的 GCN 模型。FGTN 模型可以将毒性预测归因于片段,为阳性化合物生成特定的结构警报。此外,FGTN 可能还具有区分各种化学异构体的能力。我们相信,FGTN 可以作为一种可靠有效的人类生殖毒性预测工具,为化学安全评估的发展做出贡献。