Suppr超能文献

酪氨酸-DNA磷酸二酯酶1(Tdp1)抑制剂的化学信息学鉴定:基于SMILES的监督式机器学习模型的比较研究

Cheminformatic Identification of Tyrosyl-DNA Phosphodiesterase 1 (Tdp1) Inhibitors: A Comparative Study of SMILES-Based Supervised Machine Learning Models.

作者信息

Lai Conan Hong-Lun, Kwok Alex Pak Ki, Wong Kwong-Cheong

机构信息

Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong 999077, China.

Data Science and Policy Studies Programme, School of Governance and Policy Science, Faculty of Social Science, The Chinese University of Hong Kong, Hong Kong 999077, China.

出版信息

J Pers Med. 2024 Sep 15;14(9):981. doi: 10.3390/jpm14090981.

Abstract

BACKGROUND

Tyrosyl-DNA phosphodiesterase 1 (Tdp1) repairs damages in DNA induced by abortive topoisomerase 1 activity; however, maintenance of genetic integrity may sustain cellular division of neoplastic cells. It follows that Tdp1-targeting chemical inhibitors could synergize well with existing chemotherapy drugs to deny cancer growth; therefore, identification of Tdp1 inhibitors may advance precision medicine in oncology.

OBJECTIVE

Current computational research efforts focus primarily on molecular docking simulations, though datasets involving three-dimensional molecular structures are often hard to curate and computationally expensive to store and process. We propose the use of simplified molecular input line entry system (SMILES) chemical representations to train supervised machine learning (ML) models, aiming to predict potential Tdp1 inhibitors.

METHODS

An open-sourced consensus dataset containing the inhibitory activity of numerous chemicals against Tdp1 was obtained from Kaggle. Various ML algorithms were trained, ranging from simple algorithms to ensemble methods and deep neural networks. For algorithms requiring numerical data, SMILES were converted to chemical descriptors using RDKit, an open-sourced Python cheminformatics library.

RESULTS

Out of 13 optimized ML models with rigorously tuned hyperparameters, the random forest model gave the best results, yielding a receiver operating characteristics-area under curve of 0.7421, testing accuracy of 0.6815, sensitivity of 0.6444, specificity of 0.7156, precision of 0.6753, and F1 score of 0.6595.

CONCLUSIONS

Ensemble methods, especially the bootstrap aggregation mechanism adopted by random forest, outperformed other ML algorithms in classifying Tdp1 inhibitors from non-inhibitors using SMILES. The discovery of Tdp1 inhibitors could unlock more treatment regimens for cancer patients, allowing for therapies tailored to the patient's condition.

摘要

背景

酪氨酰-DNA磷酸二酯酶1(Tdp1)修复由流产型拓扑异构酶1活性诱导的DNA损伤;然而,遗传完整性的维持可能会维持肿瘤细胞的细胞分裂。因此,靶向Tdp1的化学抑制剂可能与现有的化疗药物很好地协同作用以抑制癌症生长;因此,鉴定Tdp1抑制剂可能会推动肿瘤学的精准医学发展。

目的

目前的计算研究主要集中在分子对接模拟上,尽管涉及三维分子结构的数据集往往难以整理,并且存储和处理的计算成本很高。我们建议使用简化分子输入线性输入系统(SMILES)化学表示法来训练监督式机器学习(ML)模型,旨在预测潜在的Tdp1抑制剂。

方法

从Kaggle获得了一个开源的共识数据集,其中包含多种化学物质对Tdp1的抑制活性。训练了各种ML算法,从简单算法到集成方法和深度神经网络。对于需要数值数据的算法,使用开源Python化学信息学库RDKit将SMILES转换为化学描述符。

结果

在13个经过严格调整超参数的优化ML模型中,随机森林模型给出了最佳结果,曲线下面积的受试者工作特征为0.7421,测试准确率为0.6815,灵敏度为0.6444,特异性为0.7156,精确率为0.6753,F1分数为0.6595。

结论

集成方法,尤其是随机森林采用的自助聚合机制,在使用SMILES从非抑制剂中分类Tdp1抑制剂方面优于其他ML算法。Tdp1抑制剂的发现可以为癌症患者解锁更多治疗方案,实现针对患者病情的个性化治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3fb5/11433629/f625898befa9/jpm-14-00981-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验