Suppr超能文献

仅使用正例和未标记示例进行蛋白质-RNA相互作用的计算预测。

Computationally predicting protein-RNA interactions using only positive and unlabeled examples.

作者信息

Cheng Zhanzhan, Zhou Shuigeng, Guan Jihong

机构信息

Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, 220 Handan Road, Shanghai 200433, China.

出版信息

J Bioinform Comput Biol. 2015 Jun;13(3):1541005. doi: 10.1142/S021972001541005X. Epub 2015 Feb 8.

Abstract

Protein-RNA interactions (PRIs) are considerably important in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulations of gene expression to the active defense of host against virus. With the development of high throughput technology, large amounts of PRI information is available for computationally predicting unknown PRIs. In recent years, a number of computational methods for predicting PRIs have been developed in the literature, which usually artificially construct negative samples based on verified nonredundant datasets of PRIs to train classifiers. However, such negative samples are not real negative samples, some even may be unknown positive samples. Consequently, the classifiers trained with such training datasets cannot achieve satisfactory prediction performance. In this paper, we propose a novel method PRIPU that employs biased-support vector machine (SVM) for predicting Protein-RNA Interactions using only Positive and Unlabeled examples. To the best of our knowledge, this is the first work that predicts PRIs using only positive and unlabeled samples. We first collect known PRIs as our benchmark datasets and extract sequence-based features to represent each PRI. To reduce the dimension of feature vectors for lowering computational cost, we select a subset of features by a filter-based feature selection method. Then, biased-SVM is employed to train prediction models with different PRI datasets. To evaluate the new method, we also propose a new performance measure called explicit positive recall (EPR), which is specifically suitable for the task of learning positive and unlabeled data. Experimental results over three datasets show that our method not only outperforms four existing methods, but also is able to predict unknown PRIs. Source code, datasets and related documents of PRIPU are available at: http://admis.fudan.edu.cn/projects/pripu.htm .

摘要

蛋白质 - RNA相互作用(PRIs)在各种各样的细胞过程中相当重要,范围从基因表达的转录和转录后调控到宿主对病毒的主动防御。随着高通量技术的发展,大量的PRI信息可用于通过计算预测未知的PRIs。近年来,文献中已经开发了许多用于预测PRIs的计算方法,这些方法通常基于经过验证的PRIs非冗余数据集人工构建负样本以训练分类器。然而,这样的负样本并不是真正的负样本,有些甚至可能是未知的正样本。因此,使用这样的训练数据集训练的分类器无法实现令人满意的预测性能。在本文中,我们提出了一种新颖的方法PRIPU,该方法采用有偏支持向量机(SVM),仅使用正样本和未标记样本预测蛋白质 - RNA相互作用。据我们所知,这是第一项仅使用正样本和未标记样本预测PRIs的工作。我们首先收集已知的PRIs作为我们的基准数据集,并提取基于序列的特征来表示每个PRI。为了降低特征向量的维度以降低计算成本,我们通过基于过滤器的特征选择方法选择特征子集。然后,使用有偏SVM用不同的PRI数据集训练预测模型。为了评估新方法,我们还提出了一种称为显式正召回率(EPR)的新性能度量,它特别适用于学习正样本和未标记数据的任务。在三个数据集上的实验结果表明,我们的方法不仅优于四种现有方法,而且能够预测未知的PRIs。PRIPU的源代码、数据集和相关文档可在以下网址获取:http://admis.fudan.edu.cn/projects/pripu.htm

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验