Zhu Mingyi, Zuber Jeffrey, Tan Zhen, Sharma Gaurav, Mathews David H
Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States.
Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, United States.
bioRxiv. 2024 Oct 15:2024.10.12.618037. doi: 10.1101/2024.10.12.618037.
RNA structure is essential for the function of many non-coding RNAs. Using multiple homologous sequences, which share structure and function, secondary structure can be predicted with much higher accuracy than with a single sequence. It can be difficult, however, to establish a set of homologous sequences when their structure is not yet known. We developed a method to identify sequences in a set of putative homologs that are in fact non-homologs.
Previously, we developed TurboFold to estimate conserved structure using multiple, unaligned RNA homologs. Here, we report that the positive predictive value of TurboFold is significantly reduced by the presence of contamination by non-homologous sequences, although the reduction is less than 1%. We developed a method called DecoyFinder, which applies machine learning trained with features determined by TurboFold, to detect sequences that are not homologous with the other sequences in the set. This method can identify approximately 45% of non-homologous sequences, at a rate of 5% misidentification of true homologous sequences.
DecoyFinder and TurboFold are incorporated in RNAstructure, which is provided for free and open source under the GPL V2 license. It can be downloaded at http://rna.urmc.rochester.edu/RNAstructure.html.
RNA结构对于许多非编码RNA的功能至关重要。利用多个具有相同结构和功能的同源序列,预测二级结构的准确性要比使用单个序列高得多。然而,当同源序列的结构尚不清楚时,很难建立一组同源序列。我们开发了一种方法来识别一组假定同源物中实际上并非同源的序列。
此前,我们开发了TurboFold,用于使用多个未比对的RNA同源物来估计保守结构。在此,我们报告称,尽管同源序列被非同源序列污染导致TurboFold的阳性预测值显著降低,但降低幅度小于1%。我们开发了一种名为DecoyFinder的方法,该方法应用通过TurboFold确定的特征进行训练的机器学习来检测与集合中其他序列不同源的序列。该方法能够识别约45%的非同源序列,误将真正同源序列识别为非同源序列的比率为5%。
DecoyFinder和TurboFold已整合到RNAstructure中,RNAstructure根据GPL V2许可免费提供且开源。可从http://rna.urmc.rochester.edu/RNAstructure.html下载。