Suppr超能文献

基于未报告特征为阴性的预测化学信息学假设

On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1352-1363. doi: 10.1109/TCBB.2019.2913855. Epub 2019 Apr 30.

Abstract

In cheminformatics, compound-target binding profiles has been a main source of data for research. For data repositories that only provide positive profiles, a popular assumption is that unreported profiles are all negative. In this paper, we caution the audience not to take this assumption for granted, and present empirical evidence of its ineffectiveness from a machine learning perspective. Our examination is based on a setting where binding profiles are used as features to train predictive models; we show (1) prediction performance degrades when the assumption fails and (2) explicit recovery of unreported profiles improves prediction performance. In particular, we propose a framework that jointly recovers profiles and learns predictive model, and show it achieves further performance improvement. The presented study not only suggests applying matrix recovery methods to recover unreported profiles, but also initiates a new missing feature problem which we called Learning with Positive and Unknown Features.

摘要

在化学信息学中,化合物-靶标结合谱一直是研究的主要数据来源。对于仅提供阳性谱的数据存储库,一个流行的假设是未报告的谱都是阴性的。在本文中,我们提醒读者不要想当然地认为这一假设成立,并从机器学习的角度提供了实证证据证明其无效性。我们的检查基于这样一种情况,即结合谱被用作特征来训练预测模型;我们展示了(1)当假设失败时,预测性能会下降,以及(2)显式恢复未报告的谱可以提高预测性能。具体来说,我们提出了一个联合恢复谱和学习预测模型的框架,并展示了它可以实现进一步的性能提升。本研究不仅建议应用矩阵恢复方法来恢复未报告的谱,还引发了一个新的缺失特征问题,我们称之为带有阳性和未知特征的学习。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验