• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

选择高质量的阴性样本以有效预测蛋白质-RNA相互作用。

Selecting high-quality negative samples for effectively predicting protein-RNA interactions.

作者信息

Cheng Zhanzhan, Huang Kai, Wang Yang, Liu Hui, Guan Jihong, Zhou Shuigeng

机构信息

School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China.

School of Computer Science, Jiangxi Normal University, Nanchang, 330022, China.

出版信息

BMC Syst Biol. 2017 Mar 14;11(Suppl 2):9. doi: 10.1186/s12918-017-0390-8.

DOI:10.1186/s12918-017-0390-8
PMID:28361676
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374704/
Abstract

BACKGROUND

The identification of Protein-RNA Interactions (PRIs) is important to understanding cell activities. Recently, several machine learning-based methods have been developed for identifying PRIs. However, the performance of these methods is unsatisfactory. One major reason is that they usually use unreliable negative samples in the training process.

METHODS

For boosting the performance of PRI prediction, we propose a novel method to generate reliable negative samples. Concretely, we firstly collect the known PRIs as positive samples for generating positive sets. For each positive set, we construct two corresponding negative sets, one is by our method and the other by random method. Each positive set is combined with a negative set to form a dataset for model training and performance evaluation. Consequently, we get 18 datasets of different species and different ratios of negative samples to positive samples. Secondly, sequence-based features are extracted to represent each of PRIs and protein-RNA pairs in the datasets. A filter-based method is employed to cut down the dimensionality of feature vectors for reducing computational cost. Finally, the performance of support vector machine (SVM), random forest (RF) and naive Bayes (NB) is evaluated on the generated 18 datasets.

RESULTS

Extensive experiments show that comparing to using randomly-generated negative samples, all classifiers achieve substantial performance improvement by using negative samples selected by our method. The improvements on accuracy and geometric mean for the SVM classifier, the RF classifier and the NB classifier are as high as 204.5 and 68.7%, 174.5 and 53.9%, 80.9 and 54.3%, respectively.

CONCLUSION

Our method is useful to the identification of PRIs.

摘要

背景

蛋白质 - RNA 相互作用(PRIs)的识别对于理解细胞活动至关重要。最近,已经开发了几种基于机器学习的方法来识别 PRIs。然而,这些方法的性能并不理想。一个主要原因是它们在训练过程中通常使用不可靠的负样本。

方法

为了提高 PRI 预测的性能,我们提出了一种生成可靠负样本的新方法。具体来说,我们首先收集已知的 PRIs 作为正样本以生成正集。对于每个正集,我们构建两个相应的负集,一个通过我们的方法,另一个通过随机方法。每个正集与一个负集组合形成一个用于模型训练和性能评估的数据集。因此,我们得到了 18 个不同物种以及负样本与正样本不同比例的数据集。其次,提取基于序列的特征来表示数据集中的每个 PRIs 和蛋白质 - RNA 对。采用基于滤波器的方法来降低特征向量的维度以降低计算成本。最后,在生成的 18 个数据集上评估支持向量机(SVM)、随机森林(RF)和朴素贝叶斯(NB)的性能。

结果

大量实验表明,与使用随机生成的负样本相比,所有分类器通过使用我们方法选择的负样本都实现了显著的性能提升。SVM 分类器、RF 分类器和 NB 分类器在准确率和几何平均值上的提升分别高达 204.5%和 68.7%、174.5%和 53.9%、80.9%和 54.3%。

结论

我们的方法对 PRIs 的识别很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/53a70cefc4a5/12918_2017_390_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/b1612b57b24a/12918_2017_390_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/7e671b69a2f1/12918_2017_390_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/7ae81ff75953/12918_2017_390_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/44e2c9372d48/12918_2017_390_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/2dc689100918/12918_2017_390_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/b6cffe013dc4/12918_2017_390_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/4058e3356ee4/12918_2017_390_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/53a70cefc4a5/12918_2017_390_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/b1612b57b24a/12918_2017_390_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/7e671b69a2f1/12918_2017_390_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/7ae81ff75953/12918_2017_390_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/44e2c9372d48/12918_2017_390_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/2dc689100918/12918_2017_390_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/b6cffe013dc4/12918_2017_390_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/4058e3356ee4/12918_2017_390_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7553/5374704/53a70cefc4a5/12918_2017_390_Fig8_HTML.jpg

相似文献

1
Selecting high-quality negative samples for effectively predicting protein-RNA interactions.选择高质量的阴性样本以有效预测蛋白质-RNA相互作用。
BMC Syst Biol. 2017 Mar 14;11(Suppl 2):9. doi: 10.1186/s12918-017-0390-8.
2
Computationally predicting protein-RNA interactions using only positive and unlabeled examples.仅使用正例和未标记示例进行蛋白质-RNA相互作用的计算预测。
J Bioinform Comput Biol. 2015 Jun;13(3):1541005. doi: 10.1142/S021972001541005X. Epub 2015 Feb 8.
3
Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples.通过从正例和无标签样例中学习来有效识别化合物-蛋白质相互作用。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1832-1843. doi: 10.1109/TCBB.2016.2570211. Epub 2016 May 18.
4
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测:现状评估。
BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.
5
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.利用核苷酸谱和组成预测RNA中的蛋白质结合区域。
BMC Syst Biol. 2017 Mar 14;11(Suppl 2):16. doi: 10.1186/s12918-017-0386-4.
6
De novo prediction of RNA-protein interactions from sequence information.从序列信息中进行RNA-蛋白质相互作用的从头预测。
Mol Biosyst. 2013 Jan 27;9(1):133-42. doi: 10.1039/c2mb25292a. Epub 2012 Nov 9.
7
TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples.TargetMiner:通过系统识别组织特异性负例进行 microRNA 靶标预测。
Bioinformatics. 2009 Oct 15;25(20):2625-31. doi: 10.1093/bioinformatics/btp503. Epub 2009 Aug 19.
8
Prediction of protein-RNA binding sites by a random forest method with combined features.基于组合特征的随机森林方法预测蛋白质-RNA 结合位点。
Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18.
9
Prediction of RNA-protein interactions by combining deep convolutional neural network with feature selection ensemble method.通过结合深度卷积神经网络和特征选择集成方法预测 RNA-蛋白质相互作用。
J Theor Biol. 2019 Jan 14;461:230-238. doi: 10.1016/j.jtbi.2018.10.029. Epub 2018 Oct 12.
10
Struct-NB: predicting protein-RNA binding sites using structural features.结构朴素贝叶斯:利用结构特征预测蛋白质-RNA结合位点
Int J Data Min Bioinform. 2010;4(1):21-43. doi: 10.1504/ijdmb.2010.030965.

引用本文的文献

1
Predicting lncRNA-protein interactions using a hybrid deep learning model with dinucleotide-codon fusion feature encoding.使用具有二核苷酸-密码子融合特征编码的混合深度学习模型预测长链非编码RNA-蛋白质相互作用。
BMC Genomics. 2024 Dec 28;25(1):1253. doi: 10.1186/s12864-024-11168-3.
2
A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder.基于卷积自动编码器的 RNA 及其相关相互作用的特定任务编码算法。
Nucleic Acids Res. 2023 Nov 27;51(21):e110. doi: 10.1093/nar/gkad929.
3
RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction.

本文引用的文献

1
Computationally predicting protein-RNA interactions using only positive and unlabeled examples.仅使用正例和未标记示例进行蛋白质-RNA相互作用的计算预测。
J Bioinform Comput Biol. 2015 Jun;13(3):1541005. doi: 10.1142/S021972001541005X. Epub 2015 Feb 8.
2
Pfam: the protein families database.Pfam:蛋白质家族数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.
3
NPInter v2.0: an updated database of ncRNA interactions.NPInter v2.0:一个更新的 ncRNA 相互作用数据库。
RNAincoder:一种基于深度学习的 RNA 及其相关相互作用的编码器。
Nucleic Acids Res. 2023 Jul 5;51(W1):W509-W519. doi: 10.1093/nar/gkad404.
4
In-silico computational approaches to study microbiota impacts on diseases and pharmacotherapy.用于研究微生物群对疾病和药物治疗影响的计算机模拟计算方法。
Gut Pathog. 2023 Mar 7;15(1):10. doi: 10.1186/s13099-023-00535-2.
5
Artificial intelligence methods enhance the discovery of RNA interactions.人工智能方法促进了RNA相互作用的发现。
Front Mol Biosci. 2022 Oct 7;9:1000205. doi: 10.3389/fmolb.2022.1000205. eCollection 2022.
6
BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.BoT-Net:一种基于轻量级技巧的神经网络,用于高效的 LncRNA-miRNA 相互作用预测。
Interdiscip Sci. 2022 Dec;14(4):841-862. doi: 10.1007/s12539-022-00535-x. Epub 2022 Aug 10.
7
Identification of piRNA disease associations using deep learning.使用深度学习识别piRNA与疾病的关联。
Comput Struct Biotechnol J. 2022 Mar 3;20:1208-1217. doi: 10.1016/j.csbj.2022.02.026. eCollection 2022.
8
LGFC-CNN: Prediction of lncRNA-Protein Interactions by Using Multiple Types of Features through Deep Learning.LGFC-CNN:通过深度学习利用多种类型特征预测 lncRNA-蛋白质相互作用
Genes (Basel). 2021 Oct 24;12(11):1689. doi: 10.3390/genes12111689.
9
AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides.AnOxPePred:使用深度学习预测肽的抗氧化性质。
Sci Rep. 2020 Dec 8;10(1):21471. doi: 10.1038/s41598-020-78319-w.
10
Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy.采用改进的负样本选择策略预测药物副作用。
Comput Math Methods Med. 2020 May 9;2020:1573543. doi: 10.1155/2020/1573543. eCollection 2020.
Nucleic Acids Res. 2014 Jan;42(Database issue):D104-8. doi: 10.1093/nar/gkt1057. Epub 2013 Nov 11.
4
Update on activities at the Universal Protein Resource (UniProt) in 2013.2013 年 泛蛋白资源库(UniProt)活动更新。
Nucleic Acids Res. 2013 Jan;41(Database issue):D43-7. doi: 10.1093/nar/gks1068. Epub 2012 Nov 17.
5
Gene Ontology annotations and resources.基因本体论注释和资源。
Nucleic Acids Res. 2013 Jan;41(Database issue):D530-5. doi: 10.1093/nar/gks1050. Epub 2012 Nov 17.
6
De novo prediction of RNA-protein interactions from sequence information.从序列信息中进行RNA-蛋白质相互作用的从头预测。
Mol Biosyst. 2013 Jan 27;9(1):133-42. doi: 10.1039/c2mb25292a. Epub 2012 Nov 9.
7
Predicting RNA-protein interactions using only sequence information.仅使用序列信息预测 RNA-蛋白质相互作用。
BMC Bioinformatics. 2011 Dec 22;12:489. doi: 10.1186/1471-2105-12-489.
8
In silico characterization and prediction of global protein-mRNA interactions in yeast.在酵母中进行基于计算机的全局蛋白质-mRNA 相互作用的特征描述和预测。
Nucleic Acids Res. 2011 Aug;39(14):5826-36. doi: 10.1093/nar/gkr160. Epub 2011 Apr 1.
9
PRIDB: a Protein-RNA interface database.PRIDB:一个蛋白质-核糖核酸相互作用界面数据库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D277-82. doi: 10.1093/nar/gkq1108. Epub 2010 Nov 11.
10
The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.2011年的STRING数据库:蛋白质的功能相互作用网络,全球整合并评分。
Nucleic Acids Res. 2011 Jan;39(Database issue):D561-8. doi: 10.1093/nar/gkq973. Epub 2010 Nov 2.