双聚类作为一种用于RNA局部多序列比对的方法。

Biclustering as a method for RNA local multiple sequence alignment.

作者信息

Wang Shu, Gutell Robin R, Miranker Daniel P

机构信息

Department of Electrical and Computer Engineering, School of Biological Sciences, University of Texas At Austin, Austin, TX 78712, USA.

出版信息

Bioinformatics. 2007 Dec 15;23(24):3289-96. doi: 10.1093/bioinformatics/btm485. Epub 2007 Oct 6.

DOI:10.1093/bioinformatics/btm485

PMID:17921494

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2228335/

Abstract

MOTIVATIONS

Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address.

RESULTS

We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions.

AVAILABILITY

BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/

摘要

动机

双聚类是一种同时对关系的域和值域进行聚类的聚类方法。多序列比对（MSA）中的一个挑战是，序列比对通常旨在揭示保守功能子序列的组。同时，序列的分组会影响比对；而这正是双聚类旨在解决的那种双重情况。

结果

我们定义了一种MSA问题的表示形式，使得双聚类算法能够得以应用。我们开发了一个用于局部MSA的计算机程序BlockMSA，它将双聚类与分治法相结合。BlockMSA同时找到相似序列的组，并在这些组内局部比对子序列。通过对序列集及其内容进行划分来完成进一步的比对。最终结果既是一个多序列比对，也是序列的层次聚类。BlockMSA在BRAliBase 2.1基准测试套件中显示出高变异性的子集上进行了测试，并在该套件扩展到更大问题规模时进行了测试。此外，还对当前生物学感兴趣的两个大型数据集，即T盒序列和IC1组内含子进行了比对评估。使用双对和（SPS）和一致性计数，将结果与由ClustalW、MAFFT、MUCLE和PROBCONS比对程序计算的比对结果进行了比较。基准测试套件的结果对问题规模很敏感。在15个或更多序列的问题上，BlockMSA始终是最好的。在测试套件的任何问题中，BlockMSA、MAFFT和PROBCONS之间的得分都没有明显差异。在T盒序列上，BlockMSA在重现已知注释方面做得最忠实。MAFFT和PROBCONS则不然。在内含子序列上，BlockMSA、MAFFT和MUSCLE在识别保守区域方面相当。

可用性

BlockMSA用Java实现。源代码和补充数据集可在http://aug.csres.utexas.edu/msa/获取。

相似文献

Biclustering as a method for RNA local multiple sequence alignment.

Bioinformatics. 2007 Dec 15;23(24):3289-96. doi: 10.1093/bioinformatics/btm485. Epub 2007 Oct 6.

Assessing the efficiency of multiple sequence alignment programs.

Algorithms Mol Biol. 2014 Mar 6;9(1):4. doi: 10.1186/1748-7188-9-4.

Improvement in the accuracy of multiple sequence alignment program MAFFT.

Genome Inform. 2005;16(1):22-33.

Multiple structural alignment and clustering of RNA sequences.

Bioinformatics. 2007 Apr 15;23(8):926-32. doi: 10.1093/bioinformatics/btm049. Epub 2007 Feb 25.

Mind the gaps: evidence of bias in estimates of multiple sequence alignments.

Mol Biol Evol. 2007 Nov;24(11):2433-42. doi: 10.1093/molbev/msm176. Epub 2007 Aug 20.

Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.

BMC Bioinformatics. 2008 Apr 25;9:212. doi: 10.1186/1471-2105-9-212.

WMSA: a novel method for multiple sequence alignment of DNA sequences.

Bioinformatics. 2022 Nov 15;38(22):5019-5025. doi: 10.1093/bioinformatics/btac658.

A local multiple alignment method for detection of non-coding RNA sequences.

Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.

Recent developments in the MAFFT multiple sequence alignment program.

Brief Bioinform. 2008 Jul;9(4):286-98. doi: 10.1093/bib/bbn013. Epub 2008 Mar 27.

MAFFT-DASH: integrated protein sequence and structural alignment.

Nucleic Acids Res. 2019 Jul 2;47(W1):W5-W10. doi: 10.1093/nar/gkz342.

引用本文的文献

PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences.

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W8-12. doi: 10.1093/nar/gkr244. Epub 2011 Apr 22.

PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach.

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S38. doi: 10.1186/1471-2105-12-S1-S38.

A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer.

BMC Bioinformatics. 2010 Sep 23;11:477. doi: 10.1186/1471-2105-11-477.

Predicting consensus structures for RNA alignments via pseudo-energy minimization.

Bioinform Biol Insights. 2009 Jun 3;3:51-69. doi: 10.4137/bbi.s2578.

本文引用的文献

Recent evolutions of multiple sequence alignment algorithms.

PLoS Comput Biol. 2007 Aug;3(8):e123. doi: 10.1371/journal.pcbi.0030123.

An enhanced RNA alignment benchmark for sequence alignment programs.

Algorithms Mol Biol. 2006 Oct 24;1:19. doi: 10.1186/1748-7188-1-19.

Biclustering algorithms for biological data analysis: a survey.

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.

Multiple sequence alignment.

Curr Opin Struct Biol. 2006 Jun;16(3):368-73. doi: 10.1016/j.sbi.2006.04.004. Epub 2006 May 5.

BicAT: a biclustering analysis toolbox.

Bioinformatics. 2006 May 15;22(10):1282-3. doi: 10.1093/bioinformatics/btl099. Epub 2006 Mar 21.

A systematic comparison and evaluation of biclustering methods for gene expression data.

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

ProbCons: Probabilistic consistency-based multiple sequence alignment.

Genome Res. 2005 Feb;15(2):330-40. doi: 10.1101/gr.2821705.

MAFFT version 5: improvement in accuracy of multiple sequence alignment.

Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.

Rfam: annotating non-coding RNAs in complete genomes.

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D121-4. doi: 10.1093/nar/gki081.

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

双聚类作为一种用于RNA局部多序列比对的方法。

Biclustering as a method for RNA local multiple sequence alignment.

作者信息

Wang Shu, Gutell Robin R, Miranker Daniel P

机构信息

Department of Electrical and Computer Engineering, School of Biological Sciences, University of Texas At Austin, Austin, TX 78712, USA.

出版信息

Bioinformatics. 2007 Dec 15;23(24):3289-96. doi: 10.1093/bioinformatics/btm485. Epub 2007 Oct 6.

DOI:10.1093/bioinformatics/btm485

PMID:17921494

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2228335/

Abstract

MOTIVATIONS

RESULTS

AVAILABILITY

BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/

摘要

动机

结果

可用性

BlockMSA用Java实现。源代码和补充数据集可在http://aug.csres.utexas.edu/msa/获取。

双聚类作为一种用于RNA局部多序列比对的方法。

Biclustering as a method for RNA local multiple sequence alignment.

作者信息

机构信息

出版信息

MOTIVATIONS

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

双聚类作为一种用于RNA局部多序列比对的方法。

Biclustering as a method for RNA local multiple sequence alignment.

作者信息

机构信息

出版信息

MOTIVATIONS

RESULTS

AVAILABILITY

动机

结果

可用性