Suppr超能文献

诱饵序列查找器:同源RNA序列集中污染物的鉴定

DecoyFinder: Identification of Contaminants in Sets of Homologous RNA Sequences.

作者信息

Zhu Mingyi, Zuber Jeffrey, Tan Zhen, Sharma Gaurav, Mathews David H

机构信息

Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States.

Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, United States.

出版信息

bioRxiv. 2024 Oct 15:2024.10.12.618037. doi: 10.1101/2024.10.12.618037.

Abstract

MOTIVATION

RNA structure is essential for the function of many non-coding RNAs. Using multiple homologous sequences, which share structure and function, secondary structure can be predicted with much higher accuracy than with a single sequence. It can be difficult, however, to establish a set of homologous sequences when their structure is not yet known. We developed a method to identify sequences in a set of putative homologs that are in fact non-homologs.

RESULTS

Previously, we developed TurboFold to estimate conserved structure using multiple, unaligned RNA homologs. Here, we report that the positive predictive value of TurboFold is significantly reduced by the presence of contamination by non-homologous sequences, although the reduction is less than 1%. We developed a method called DecoyFinder, which applies machine learning trained with features determined by TurboFold, to detect sequences that are not homologous with the other sequences in the set. This method can identify approximately 45% of non-homologous sequences, at a rate of 5% misidentification of true homologous sequences.

AVAILABILITY

DecoyFinder and TurboFold are incorporated in RNAstructure, which is provided for free and open source under the GPL V2 license. It can be downloaded at http://rna.urmc.rochester.edu/RNAstructure.html.

摘要

动机

RNA结构对于许多非编码RNA的功能至关重要。利用多个具有相同结构和功能的同源序列,预测二级结构的准确性要比使用单个序列高得多。然而,当同源序列的结构尚不清楚时,很难建立一组同源序列。我们开发了一种方法来识别一组假定同源物中实际上并非同源的序列。

结果

此前,我们开发了TurboFold,用于使用多个未比对的RNA同源物来估计保守结构。在此,我们报告称,尽管同源序列被非同源序列污染导致TurboFold的阳性预测值显著降低,但降低幅度小于1%。我们开发了一种名为DecoyFinder的方法,该方法应用通过TurboFold确定的特征进行训练的机器学习来检测与集合中其他序列不同源的序列。该方法能够识别约45%的非同源序列,误将真正同源序列识别为非同源序列的比率为5%。

可用性

DecoyFinder和TurboFold已整合到RNAstructure中,RNAstructure根据GPL V2许可免费提供且开源。可从http://rna.urmc.rochester.edu/RNAstructure.html下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f45a/11507696/498174e02e5e/nihpp-2024.10.12.618037v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验