Suppr超能文献

从缺失非随机标签中学习预测药物靶点相互作用。

Learning to Predict Drug Target Interaction From Missing Not at Random Labels.

出版信息

IEEE Trans Nanobioscience. 2019 Jul;18(3):353-359. doi: 10.1109/TNB.2019.2909293. Epub 2019 Apr 9.

Abstract

The prediction of Drug-Target Interaction (DTI) is an important research direction in bioinformatics as it greatly shortens the development cycle of new drugs. State-of-the-art computational methods for DTI prediction adopt a binary classification framework. The supervision is incomplete, i.e. only a small amount of DTIs are known and treated as positive instances, while the rest are unknown and treated as negative. Two severe problems occur in such a framework: (1) the number of negative samples is overwhelming and (2) a negative label cannot rule out the possibility of a positive drug-target interaction. In this paper, we address the problem of learning from incomplete labels in DTI prediction. The key assumption here is that labels are missing not at random. For example, negative DTI labels are more likely to be missing because biomedical researchers prioritize to study DTIs that are more likely to be positive. We introduce a novel probabilistic model, factorization with non-random missing labels (FNML). It models the generative process for the DTI labels (i.e. the labels are positive or negative) and responses (i.e. the labels are observed or missing). In particular, the probability of observing or missing a label is associated with the sign of the label. In order to further reduce prediction variance and improve prediction accuracy on highly imbalanced DTI datasets, we present FNML-EN, an ensemble scheme which is designed specifically for FNML model. We conduct comprehensive experiments on the latest DTI database, demonstrating the superior and robust performance of the proposed models.

摘要

药物-靶标相互作用(DTI)的预测是生物信息学中的一个重要研究方向,因为它大大缩短了新药的开发周期。用于 DTI 预测的最先进的计算方法采用二进制分类框架。监督是不完整的,即只有少量的 DTI 是已知的,并被视为阳性实例,而其余的则是未知的,并被视为阴性。在这样的框架中会出现两个严重的问题:(1)负样本数量过多,(2)负标签不能排除阳性药物-靶标相互作用的可能性。在本文中,我们解决了 DTI 预测中从不完整标签学习的问题。这里的关键假设是标签的缺失不是随机的。例如,负 DTI 标签更有可能缺失,因为生物医学研究人员优先研究更有可能是阳性的 DTI。我们引入了一种新的概率模型,具有非随机缺失标签的分解(FNML)。它对 DTI 标签(即标签是阳性还是阴性)和响应(即标签是观察到的还是缺失的)的生成过程进行建模。特别是,观察或缺失标签的概率与标签的符号相关联。为了进一步降低预测方差并提高高度不平衡的 DTI 数据集上的预测准确性,我们提出了 FNML-EN,这是一种专门为 FNML 模型设计的集成方案。我们在最新的 DTI 数据库上进行了全面的实验,证明了所提出模型的优越和稳健的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验