Suppr超能文献

Top-DTI:整合拓扑深度学习与大语言模型用于药物靶点相互作用预测

Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction.

作者信息

Talo Muhammed, Bozdag Serdar

机构信息

Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA.

BioDiscovery Institute, University of North Texas, Denton, TX 76207, USA.

出版信息

bioRxiv. 2025 Feb 8:2025.02.07.637146. doi: 10.1101/2025.02.07.637146.

Abstract

MOTIVATION

The accurate prediction of drug-target interactions (DTI) is a crucial step in drug discovery, providing a foundation for identifying novel therapeutics. Traditional drug development is both costly and time-consuming, often spanning over a decade. Computational approaches help narrow the pool of compound candidates, offering significant starting points for experimental validation. In this study, we propose Top-DTI framework for predicting DTI by integrating topological data analysis (TDA) with large language models (LLMs). Top-DTI leverages persistent homology to extract topological features from protein contact maps and drug molecular images. Simultaneously, protein and drug LLMs generate semantically rich embeddings that capture sequential and contextual information from protein sequences and drug SMILES strings. By combining these complementary features, Top-DTI enhances predictive performance and robustness.

RESULTS

Experimental results on the public BioSNAP and Human DTI benchmark datasets demonstrate that the proposed Top-DTI model outperforms state-of-the-art approaches across multiple evaluation metrics, including AUROC, AUPRC, sensitivity, and specificity. Furthermore, the Top-DTI model achieves superior performance in the challenging cold-split scenario, where the test and validation sets contain drugs or targets absent from the training set. This setting simulates real-world scenarios and highlights the robustness of the model. Notably, incorporating topological features alongside LLM embeddings significantly improves predictive performance, underscoring the value of integrating structural and sequence-based representations.

AVAILABILITY

The data and source code of Top-DTI is available at https://github.com/bozdaglab/Top_DTI under Creative Commons Attribution Non Commercial 4.0 International Public License.

摘要

动机

准确预测药物-靶点相互作用(DTI)是药物发现中的关键一步,为识别新型治疗方法奠定基础。传统药物开发成本高且耗时,通常跨越十多年。计算方法有助于缩小化合物候选范围,为实验验证提供重要起点。在本研究中,我们提出了Top-DTI框架,通过将拓扑数据分析(TDA)与大语言模型(LLM)相结合来预测DTI。Top-DTI利用持久同调从蛋白质接触图和药物分子图像中提取拓扑特征。同时,蛋白质和药物大语言模型生成语义丰富的嵌入,捕捉来自蛋白质序列和药物SMILES字符串的序列和上下文信息。通过结合这些互补特征,Top-DTI提高了预测性能和稳健性。

结果

在公共BioSNAP和人类DTI基准数据集上的实验结果表明,所提出的Top-DTI模型在包括AUROC、AUPRC、敏感性和特异性在内的多个评估指标上优于现有方法。此外,Top-DTI模型在具有挑战性的冷分割场景中表现出色,其中测试集和验证集包含训练集中不存在的药物或靶点。此设置模拟了现实世界的场景,突出了模型的稳健性。值得注意的是,将拓扑特征与大语言模型嵌入相结合显著提高了预测性能,强调了整合基于结构和序列的表示的价值。

可用性

Top-DTI的数据和源代码可在https://github.com/bozdaglab/Top_DTI上获取,遵循知识共享署名非商业性4.0国际公共许可证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04f4/11839103/25eee87e01b6/nihpp-2025.02.07.637146v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验