Suppr超能文献

DD强度:使用预训练的深度学习模型嵌入来处理不均衡的药物-药物相互作用风险水平。

DDintensity: Addressing imbalanced drug-drug interaction risk levels using pre-trained deep learning model embeddings.

作者信息

Xie Weidun, Chen Xingjian, Huang Lei, Zheng Zetian, Wang Yuchen, Zhang Ruoxuan, Zhang Xiao, Liu Zhichao, Peng Chengbin, Gullerova Monika, Wong Ka-Chun

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong; Sir William Dunn School of Pathology, University of Oxford, UK.

Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

Artif Intell Med. 2025 Oct;168:103202. doi: 10.1016/j.artmed.2025.103202. Epub 2025 Jul 1.

Abstract

Imbalanced datasets have been a persistent challenge in bioinformatics, particularly in the context of drug-drug interaction (DDI) risk level datasets. Such imbalance can lead to biased models that perform poorly on underrepresented classes. To address this issue, one strategy is to construct a balanced dataset, while another involves employing more advanced features and models. In this study, we introduce a novel approach called DDintensity, which leverages pre-trained deep learning models as embedding generators combined with LSTM-attention models to address the imbalance in DDI risk level datasets. We tested embeddings from various domains, including images, graphs, and textual corpus. Among these, embeddings generated by BioGPT achieved the highest performance, with an Area Under the Curve (AUC) of 0.97 and an Area Under the Precision-Recall curve (AUPR) of 0.92. Our model was trained on the DDinter and further validated using the MecDDI dataset. Additionally, case studies on chemotherapeutic drugs, DB00398 (Sorafenib) and DB01204 (Mitoxantrone) used in oncology, were conducted to demonstrate the specificity and effectiveness of the this methods. Our approach demonstrates high scalability across DDI modalities, as well as the discovery of novel interactions. In summary, we introduce DDIntensity as a solution for imbalanced datasets in bioinformatics with pre-trained deep-learning embeddings.

摘要

不平衡数据集一直是生物信息学中的一个长期挑战,特别是在药物 - 药物相互作用(DDI)风险水平数据集的背景下。这种不平衡可能导致在代表性不足的类别上表现不佳的有偏差模型。为了解决这个问题,一种策略是构建一个平衡的数据集,而另一种策略则涉及采用更先进的特征和模型。在本研究中,我们引入了一种名为DDintensity的新方法,该方法利用预训练的深度学习模型作为嵌入生成器,并结合LSTM - 注意力模型来解决DDI风险水平数据集中的不平衡问题。我们测试了来自各种领域的嵌入,包括图像、图形和文本语料库。其中,由BioGPT生成的嵌入性能最高,曲线下面积(AUC)为0.97,精确率 - 召回率曲线下面积(AUPR)为0.92。我们的模型在DDinter上进行训练,并使用MecDDI数据集进行进一步验证。此外,还对肿瘤学中使用的化疗药物DB00398(索拉非尼)和DB01204(米托蒽醌)进行了案例研究,以证明该方法的特异性和有效性。我们的方法在DDI模式中展示了高可扩展性,以及发现新的相互作用。总之,我们引入DDIntensity作为一种利用预训练深度学习嵌入来解决生物信息学中不平衡数据集的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验