Suppr超能文献

使用广义质心估计器改进RNA结构保守性的测量。

Improved measurements of RNA structure conservation with generalized centroid estimators.

作者信息

Okada Yohei, Saito Yutaka, Sato Kengo, Sakakibara Yasubumi

机构信息

Department of Biosciences and Informatics, Keio University Yokohama, Japan.

出版信息

Front Genet. 2011 Aug 31;2:54. doi: 10.3389/fgene.2011.00054. eCollection 2011.

Abstract

Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the most stable structure, MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, instead of MFE itself, several alternative measures for identifying ncRNAs have been proposed such as the structure conservation index (SCI) and the base pair distance (BPD), both of which employ MFE structures. However, these measurements are unfortunately not suitable for identifying ncRNAs in some cases including the genome-wide search and incur high false discovery rate. In this study, we propose improved measurements based on SCI and BPD, applying generalized centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that our proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the centroid-based SCI on CLUSTAL W alignments is more accurate than or comparable with that of the original SCI on structural alignments generated with RAF, a high quality structural aligner, for which twofold expensive computational time is required on average. We conclude that our methods are more suitable for genome-wide alignments which are of low quality from the point of view on secondary structures than the original SCI and BPD.

摘要

在基因组中识别非蛋白质编码RNA(ncRNA)不仅是分子细胞生物学的一项关键任务,也是生物信息学的关键任务。ncRNA的二级结构被用作ncRNA分析的一个关键特征,因为ncRNA的生物学功能与其二级结构密切相关。尽管RNA序列的最小自由能(MFE)结构被认为是最稳定的结构,但仅MFE本身并不能作为识别ncRNA的合适指标,因为自由能受核苷酸组成的影响很大。因此,除了MFE本身,还提出了几种识别ncRNA的替代指标,如结构保守指数(SCI)和碱基对距离(BPD),这两种指标都采用MFE结构。然而,不幸的是,这些测量方法在某些情况下(包括全基因组搜索)并不适合识别ncRNA,并且会导致较高的错误发现率。在本研究中,我们提出了基于SCI和BPD的改进测量方法,应用广义质心估计器来提高对低质量多重比对的鲁棒性。我们的实验表明,我们提出的方法不仅在人工编辑的结构比对上,而且在CLUSTAL W生成的低质量比对上,都比原始的SCI和BPD具有更高的准确性。此外,基于质心的SCI在CLUSTAL W比对上比在由高质量结构比对工具RAF生成的结构比对上的原始SCI更准确或相当,而RAF平均需要两倍的计算时间。我们得出结论,从二级结构的角度来看,我们的方法比原始的SCI和BPD更适合低质量的全基因组比对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3b0/3268607/30ab03c00acc/fgene-02-00054-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验