• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于序列比对程序的增强型RNA比对基准。

An enhanced RNA alignment benchmark for sequence alignment programs.

作者信息

Wilm Andreas, Mainz Indra, Steger Gerhard

机构信息

Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr, 1, 40225 Düsseldorf, Germany.

出版信息

Algorithms Mol Biol. 2006 Oct 24;1:19. doi: 10.1186/1748-7188-1-19.

DOI:10.1186/1748-7188-1-19
PMID:17062125
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1635699/
Abstract

BACKGROUND

The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone - the similarity range where alignment quality drops drastically - starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark.

RESULTS

The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters--like nucleotide substitution matrices and gap-costs--as well as of programs is rated by rank tests.

CONCLUSION

Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 %. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI < or = 75 %; optimal parameter combinations are shown for several programs. The use of different 4 x 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI > 55 %; at lower APSI the use of sequence+structure alignment programs is recommended.

摘要

背景

比对程序的性能传统上是在已知参考比对的蛋白质序列集上进行测试的。从这类蛋白质基准测试得出的结论不一定适用于RNA比对问题,正如迄今发布的首个RNA比对基准测试所表明的那样。例如,“黄昏区”(比对质量急剧下降的相似性范围)对于RNA而言从60%开始,而对于蛋白质则从20%开始。在本研究中,我们改进了先前的基准测试。

结果

基准数据库中的RNA序列集取自更多数量的RNA家族,以避免仅使用少数家族带来的意外影响。序列集的大小从2到15个序列不等,以评估序列数量对程序性能的影响。比对质量通过两种度量来评分:一种仅考虑核苷酸匹配,另一种度量结构保守性。参数(如核苷酸替换矩阵和空位成本)以及程序的性能顺序通过秩检验来评定。

结论

大多数序列比对程序在具有高序列同一性(即平均成对序列同一性(APSI)高于75%)的RNA序列集上表现相当。空位开放和空位延伸的参数对低于APSI≤75%的比对质量有很大影响;展示了几个程序的最佳参数组合。仅在某些情况下,使用不同的4×4替换矩阵可提高程序性能。迭代程序的性能随着序列数量增加和/或序列同一性降低而急剧提高,这使得它们明显优于使用纯非迭代渐进方法的程序。最佳的序列比对程序能产生低至APSI>55%的高质量比对;在较低的APSI时,建议使用序列+结构比对程序。

相似文献

1
An enhanced RNA alignment benchmark for sequence alignment programs.用于序列比对程序的增强型RNA比对基准。
Algorithms Mol Biol. 2006 Oct 24;1:19. doi: 10.1186/1748-7188-1-19.
2
Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments.通过参照结构比对进行迭代优化,多重蛋白质序列比对的准确性得到显著提高。
J Mol Biol. 1996 Dec 13;264(4):823-38. doi: 10.1006/jmbi.1996.0679.
3
Do aligned sequences share the same fold?比对后的序列具有相同的折叠结构吗?
J Mol Biol. 1997 Oct 17;273(1):355-68. doi: 10.1006/jmbi.1997.1287.
4
STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time.STRAL:利用碱基配对概率向量在二次时间内对非编码RNA进行渐进比对。
Bioinformatics. 2006 Jul 1;22(13):1593-9. doi: 10.1093/bioinformatics/btl142. Epub 2006 Apr 13.
5
A comprehensive comparison of multiple sequence alignment programs.多个序列比对程序的全面比较。
Nucleic Acids Res. 1999 Jul 1;27(13):2682-90. doi: 10.1093/nar/27.13.2682.
6
Using CLUSTAL for multiple sequence alignments.使用CLUSTAL进行多序列比对。
Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.
7
ALIGN_MTX--an optimal pairwise textual sequence alignment program, adapted for using in sequence-structure alignment.ALIGN_MTX——一个优化的成对文本序列比对程序,适用于序列-结构比对。
Comput Biol Chem. 2009 Jun;33(3):235-8. doi: 10.1016/j.compbiolchem.2009.04.003. Epub 2009 May 3.
8
Sequence alignment with an appropriate substitution matrix.使用合适的替换矩阵进行序列比对。
J Comput Biol. 2008 Mar;15(2):129-38. doi: 10.1089/cmb.2007.0155.
9
The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence.几个多序列比对程序针对一个rRNA序列的二级结构特征的性能。
Mol Biol Evol. 2000 Apr;17(4):530-9. doi: 10.1093/oxfordjournals.molbev.a026333.
10
Analysis and comparison of benchmarks for multiple sequence alignment.多序列比对基准的分析与比较
In Silico Biol. 2006;6(4):321-39.

引用本文的文献

1
REDalign: accurate RNA structural alignment using residual encoder-decoder network.REDalign:使用残差编码器-解码器网络进行精确的 RNA 结构比对。
BMC Bioinformatics. 2024 Nov 5;25(1):346. doi: 10.1186/s12859-024-05956-7.
2
Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning.通过深度表示学习进行RNA结构比对和聚类的信息性RNA碱基嵌入
NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012. doi: 10.1093/nargab/lqac012. eCollection 2022 Mar.
3
RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix.

本文引用的文献

1
SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments.SCARNA:通过匹配固定长度的茎片段实现RNA序列的快速准确结构比对。
Bioinformatics. 2006 Jul 15;22(14):1723-9. doi: 10.1093/bioinformatics/btl177. Epub 2006 May 11.
2
STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time.STRAL:利用碱基配对概率向量在二次时间内对非编码RNA进行渐进比对。
Bioinformatics. 2006 Jul 1;22(13):1593-9. doi: 10.1093/bioinformatics/btl142. Epub 2006 Apr 13.
3
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.
RNAlign2D:一种使用伪氨基酸替换矩阵进行 RNA 结构和基于序列联合比对的快速方法。
BMC Bioinformatics. 2021 Oct 16;22(1):504. doi: 10.1186/s12859-021-04426-8.
4
Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains.利用马尔可夫链对RNA同时进行比对和折叠时的快速准确结构概率估计。
Algorithms Mol Biol. 2020 Nov 13;15(1):19. doi: 10.1186/s13015-020-00179-w.
5
The locality dilemma of Sankoff-like RNA alignments.Sankoff 型 RNA 比对的局部困境。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i242-i250. doi: 10.1093/bioinformatics/btaa431.
6
RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency.RNAconTest:基于结构一致性比较非编码 RNA 多重序列比对工具。
RNA. 2020 May;26(5):531-540. doi: 10.1261/rna.073015.119. Epub 2020 Jan 31.
7
DotAligner: identification and clustering of RNA structure motifs.DotAligner:RNA 结构基序的识别和聚类。
Genome Biol. 2017 Dec 28;18(1):244. doi: 10.1186/s13059-017-1371-3.
8
TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs.TurboFold II:基于多个同源物的RNA结构比对与二级结构预测
Nucleic Acids Res. 2017 Nov 16;45(20):11570-11581. doi: 10.1093/nar/gkx815.
9
Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers.Friedman秩和两两比较的确切p值及其在分类器比较中的应用。
BMC Bioinformatics. 2017 Jan 25;18(1):68. doi: 10.1186/s12859-017-1486-2.
10
The Mitochondrial Genomes of the Zoonotic Canine Filarial Parasites Dirofilaria (Nochtiella) repens and Candidatus Dirofilaria (Nochtiella) hongkongensis Provide Evidence for Presence of Cryptic Species.人兽共患犬丝状寄生虫匐行恶丝虫(诺氏恶丝虫属)和香港新立恶丝虫(诺氏恶丝虫属)的线粒体基因组为隐性物种的存在提供了证据。
PLoS Negl Trop Dis. 2016 Oct 11;10(10):e0005028. doi: 10.1371/journal.pntd.0005028. eCollection 2016 Oct.
BAliBASE 3.0:多序列比对基准测试的最新进展。
Proteins. 2005 Oct 1;61(1):127-36. doi: 10.1002/prot.20527.
4
An algorithm for progressive multiple alignment of sequences with insertions.一种用于含插入序列的渐进多序列比对算法。
Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62. doi: 10.1073/pnas.0409137102. Epub 2005 Jul 6.
5
A benchmark of multiple sequence alignment programs upon structural RNAs.基于结构RNA的多序列比对程序的基准测试。
Nucleic Acids Res. 2005 Apr 28;33(8):2433-9. doi: 10.1093/nar/gki541. Print 2005.
6
Recurrent structural RNA motifs, Isostericity Matrices and sequence alignments.重复结构RNA基序、等排性矩阵和序列比对。
Nucleic Acids Res. 2005 Apr 28;33(8):2395-409. doi: 10.1093/nar/gki535. Print 2005.
7
jPHYDIT: a JAVA-based integrated environment for molecular phylogeny of ribosomal RNA sequences.jPHYDIT:一个基于Java的核糖体RNA序列分子系统发育综合环境。
Bioinformatics. 2005 Jul 15;21(14):3171-3. doi: 10.1093/bioinformatics/bti463. Epub 2005 Apr 26.
8
Accelerated probabilistic inference of RNA structure evolution.RNA结构进化的加速概率推断
BMC Bioinformatics. 2005 Mar 24;6:73. doi: 10.1186/1471-2105-6-73.
9
Predicting a set of minimal free energy RNA secondary structures common to two sequences.预测两个序列共有的一组最小自由能RNA二级结构。
Bioinformatics. 2005 May 15;21(10):2246-53. doi: 10.1093/bioinformatics/bti349. Epub 2005 Feb 24.
10
ProbCons: Probabilistic consistency-based multiple sequence alignment.ProbCons:基于概率一致性的多序列比对。
Genome Res. 2005 Feb;15(2):330-40. doi: 10.1101/gr.2821705.