PSoL：一种用于寻找非编码RNA基因的仅正样本学习算法。

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.

作者信息

Wang Chunlin, Ding Chris, Meraz Richard F, Holbrook Stephen R

机构信息

Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA.

出版信息

Bioinformatics. 2006 Nov 1;22(21):2590-6. doi: 10.1093/bioinformatics/btl441. Epub 2006 Aug 31.

DOI:10.1093/bioinformatics/btl441

PMID:16945945

Abstract

MOTIVATION

Small non-coding RNA (ncRNA) genes play important regulatory roles in a variety of cellular processes. However, detection of ncRNA genes is a great challenge to both experimental and computational approaches. In this study, we describe a new approach called positive sample only learning (PSoL) to predict ncRNA genes in the Escherichia coli genome. Although PSoL is a machine learning method for classification, it requires no negative training data, which, in general, is hard to define properly and affects the performance of machine learning dramatically. In addition, using the support vector machine (SVM) as the core learning algorithm, PSoL can integrate many different kinds of information to improve the accuracy of prediction. Besides the application of PSoL for predicting ncRNAs, PSoL is applicable to many other bioinformatics problems as well.

RESULTS

The PSoL method is assessed by 5-fold cross-validation experiments which show that PSoL can achieve about 80% accuracy in recovery of known ncRNAs. We compared PSoL predictions with five previously published results. The PSoL method has the highest percentage of predictions overlapping with those from other methods.

摘要

动机

小型非编码RNA（ncRNA）基因在多种细胞过程中发挥着重要的调控作用。然而，ncRNA基因的检测对实验方法和计算方法来说都是巨大的挑战。在本研究中，我们描述了一种名为仅正样本学习（PSoL）的新方法，用于预测大肠杆菌基因组中的ncRNA基因。尽管PSoL是一种用于分类的机器学习方法，但它不需要负训练数据，而负训练数据通常很难正确定义，并且会极大地影响机器学习的性能。此外，以支持向量机（SVM）作为核心学习算法，PSoL可以整合许多不同类型的信息以提高预测的准确性。除了将PSoL应用于预测ncRNA外，PSoL也适用于许多其他生物信息学问题。

结果

通过五折交叉验证实验对PSoL方法进行了评估，结果表明PSoL在恢复已知ncRNA方面可以达到约80%的准确率。我们将PSoL的预测结果与之前发表的五个结果进行了比较。PSoL方法与其他方法的预测结果重叠的百分比最高。

相似文献

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.PSoL：一种用于寻找非编码RNA基因的仅正样本学习算法。

Bioinformatics. 2006 Nov 1;22(21):2590-6. doi: 10.1093/bioinformatics/btl441. Epub 2006 Aug 31.

Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change.基于预测的二级结构形成自由能变化检测非编码RNA。

BMC Bioinformatics. 2006 Mar 27;7:173. doi: 10.1186/1471-2105-7-173.

A practical guide to the art of RNA gene prediction.RNA基因预测技术实用指南。

Brief Bioinform. 2007 Nov;8(6):396-414. doi: 10.1093/bib/bbm011. Epub 2007 May 4.

De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures.利用全局和内在折叠度量对来自基因组假发夹结构的前体微小RNA进行从头支持向量机分类。

Bioinformatics. 2007 Jun 1;23(11):1321-30. doi: 10.1093/bioinformatics/btm026. Epub 2007 Jan 31.

[Support vector data description for finding non-coding RNA gene].用于寻找非编码RNA基因的支持向量数据描述

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2010 Aug;27(4):779-84.

Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming.利用增强遗传编程预测大肠杆菌中的非编码RNA基因。

Nucleic Acids Res. 2005 Jun 7;33(10):3263-70. doi: 10.1093/nar/gki644. Print 2005.

A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements.一种基于序列的非编码RNA识别过滤方法及其在核糖开关元件搜索中的应用。

Bioinformatics. 2006 Jul 15;22(14):e557-65. doi: 10.1093/bioinformatics/btl232.

Considerations in the identification of functional RNA structural elements in genomic alignments.基因组比对中功能性RNA结构元件识别的考量因素。

BMC Bioinformatics. 2007 Jan 30;8:33. doi: 10.1186/1471-2105-8-33.

Prediction of mRNA polyadenylation sites by support vector machine.利用支持向量机预测mRNA聚腺苷酸化位点

Bioinformatics. 2006 Oct 1;22(19):2320-5. doi: 10.1093/bioinformatics/btl394. Epub 2006 Jul 26.

Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data.大海捞针：在比较基因组学数据中识别微小RNA前体

Bioinformatics. 2006 Jul 15;22(14):e197-202. doi: 10.1093/bioinformatics/btl257.

引用本文的文献

Machine learning-augmented m6A-Seq analysis without a reference genome.无需参考基因组的机器学习增强型m6A序列分析。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf235.

Identification of potential riboswitch elements in Homo sapiens mRNA 5'UTR sequences using positive-unlabeled machine learning.使用正无标记机器学习方法鉴定人类mRNA 5'非翻译区序列中的潜在核糖开关元件

PLoS One. 2025 Apr 24;20(4):e0320282. doi: 10.1371/journal.pone.0320282. eCollection 2025.

Identification of potential riboswitch elements inmRNA 5'UTR sequences using Positive-Unlabeled machine learning.使用正无标记机器学习识别mRNA 5'非翻译区序列中的潜在核糖开关元件。

bioRxiv. 2024 Dec 6:2023.11.23.568398. doi: 10.1101/2023.11.23.568398.

Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption.非随机选择的正无标记学习（PULSNAR）：无需完全随机选择假设的类比例估计。

PeerJ Comput Sci. 2024 Nov 5;10:e2451. doi: 10.7717/peerj-cs.2451. eCollection 2024.

A Novel Classification Method: Neighborhood-Based Positive Unlabeled Learning Using Decision Tree (NPULUD).一种新型分类方法：基于邻域的使用决策树的正例未标注学习（NPULUD）。

Entropy (Basel). 2024 May 4;26(5):403. doi: 10.3390/e26050403.

Learning peptide properties with positive examples only.仅通过正例学习肽的特性。

Digit Discov. 2024 Apr 19;3(5):977-986. doi: 10.1039/d3dd00218g. eCollection 2024 May 15.

Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning.利用正无标记学习法对赖氨酸戊二酰化位点进行计算识别

Curr Genomics. 2020 Apr;21(3):204-211. doi: 10.2174/1389202921666200511072327.

Recognition of Protein Pupylation Sites by Adopting Resampling Approach.采用重采样方法识别蛋白泛素化位点。

Molecules. 2018 Nov 27;23(12):3097. doi: 10.3390/molecules23123097.

PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants.植物RNA嗅探器：一种基于支持向量机的预测植物长链基因间非编码RNA的工作流程。

Noncoding RNA. 2017 Mar 4;3(1):11. doi: 10.3390/ncrna3010011.

EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites.EPuL：一种用于预测泛素化位点的增强型正未标记学习算法

Molecules. 2017 Sep 5;22(9):1463. doi: 10.3390/molecules22091463.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PSoL：一种用于寻找非编码RNA基因的仅正样本学习算法。

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献