• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

注意差距:多重序列比对估计中的偏差证据。

Mind the gaps: evidence of bias in estimates of multiple sequence alignments.

作者信息

Golubchik Tanya, Wise Michael J, Easteal Simon, Jermiin Lars S

机构信息

School of Biological Sciences, University of Sydney, Sydney, Australia.

出版信息

Mol Biol Evol. 2007 Nov;24(11):2433-42. doi: 10.1093/molbev/msm176. Epub 2007 Aug 20.

DOI:10.1093/molbev/msm176
PMID:17709332
Abstract

Multiple sequence alignment (MSA) is a crucial first step in the analysis of genomic and proteomic data. Commonly occurring sequence features, such as deletions and insertions, are known to affect the accuracy of MSA programs, but the extent to which alignment accuracy is affected by the positions of insertions and deletions has not been examined independently of other sources of sequence variation. We assessed the performance of 6 popular MSA programs (ClustalW, DIALIGN-T, MAFFT, MUSCLE, PROBCONS, and T-COFFEE) and one experimental program, PRANK, on amino acid sequences that differed only by short regions of deleted residues. The analysis showed that the absence of residues often led to an incorrect placement of gaps in the alignments, even though the sequences were otherwise identical. In data sets containing sequences with partially overlapping deletions, most MSA programs preferentially aligned the gaps vertically at the expense of incorrectly aligning residues in the flanking regions. Of the programs assessed, only DIALIGN-T was able to place overlapping gaps correctly relative to one another, but this was usually context dependent and was observed only in some of the data sets. In data sets containing sequences with non-overlapping deletions, both DIALIGN-T and MAFFT (G-INS-I) were able to align gaps with near-perfect accuracy, but only MAFFT produced the correct alignment consistently. The same was true for data sets that comprised isoforms of alternatively spliced gene products: both DIALIGN-T and MAFFT produced highly accurate alignments, with MAFFT being the more consistent of the 2 programs. Other programs, notably T-COFFEE and ClustalW, were less accurate. For all data sets, alignments produced by different MSA programs differed markedly, indicating that reliance on a single MSA program may give misleading results. It is therefore advisable to use more than one MSA program when dealing with sequences that may contain deletions or insertions, particularly for high-throughput and pipeline applications where manual refinement of each alignment is not practicable.

摘要

多序列比对(MSA)是基因组和蛋白质组数据分析中至关重要的第一步。已知常见的序列特征,如缺失和插入,会影响MSA程序的准确性,但插入和缺失位置对比对准确性的影响程度尚未独立于其他序列变异来源进行研究。我们评估了6种流行的MSA程序(ClustalW、DIALIGN-T、MAFFT、MUSCLE、PROBCONS和T-COFFEE)以及一个实验程序PRANK在仅因短缺失残基区域不同的氨基酸序列上的性能。分析表明,即使序列在其他方面相同,残基的缺失通常也会导致比对中缺口的错误放置。在包含部分重叠缺失序列的数据集中,大多数MSA程序优先将缺口垂直对齐,而牺牲了侧翼区域残基的错误比对。在所评估的程序中,只有DIALIGN-T能够相对于彼此正确放置重叠缺口,但这通常取决于上下文,并且仅在一些数据集中观察到。在包含非重叠缺失序列的数据集中,DIALIGN-T和MAFFT(G-INS-I)都能够以近乎完美的准确性对齐缺口,但只有MAFFT始终产生正确的比对。对于由可变剪接基因产物的异构体组成的数据集也是如此:DIALIGN-T和MAFFT都产生了高度准确的比对,MAFFT是这两个程序中更一致的。其他程序,特别是T-COFFEE和ClustalW,准确性较低。对于所有数据集,不同MSA程序产生的比对差异显著,这表明依赖单个MSA程序可能会给出误导性结果。因此,在处理可能包含缺失或插入的序列时,尤其是在无法对每个比对进行人工优化的高通量和流水线应用中,建议使用多个MSA程序。

相似文献

1
Mind the gaps: evidence of bias in estimates of multiple sequence alignments.注意差距:多重序列比对估计中的偏差证据。
Mol Biol Evol. 2007 Nov;24(11):2433-42. doi: 10.1093/molbev/msm176. Epub 2007 Aug 20.
2
Assessing the efficiency of multiple sequence alignment programs.评估多序列比对程序的效率。
Algorithms Mol Biol. 2014 Mar 6;9(1):4. doi: 10.1186/1748-7188-9-4.
3
Improvement in the accuracy of multiple sequence alignment program MAFFT.多重序列比对程序MAFFT准确性的提高。
Genome Inform. 2005;16(1):22-33.
4
The accuracy of several multiple sequence alignment programs for proteins.几种蛋白质多序列比对程序的准确性。
BMC Bioinformatics. 2006 Oct 24;7:471. doi: 10.1186/1471-2105-7-471.
5
Evaluating the accuracy and efficiency of multiple sequence alignment methods.评估多序列比对方法的准确性和效率。
Evol Bioinform Online. 2014 Dec 7;10:205-17. doi: 10.4137/EBO.S19199. eCollection 2014.
6
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
7
Phylogeny-aware alignment with PRANK.使用PRANK进行系统发育感知比对。
Methods Mol Biol. 2014;1079:155-70. doi: 10.1007/978-1-62703-646-7_10.
8
DIALIGN: finding local similarities by multiple sequence alignment.DIALIGN:通过多序列比对寻找局部相似性。
Bioinformatics. 1998;14(3):290-4. doi: 10.1093/bioinformatics/14.3.290.
9
A knowledge-based multiple-sequence alignment algorithm.基于知识的多序列比对算法。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):884-96. doi: 10.1109/TCBB.2013.102.
10
Protein multiple sequence alignment benchmarking through secondary structure prediction.通过二级结构预测进行蛋白质多序列比对基准测试。
Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.

引用本文的文献

1
Benchmarking machine learning robustness in Covid-19 genome sequence classification.在新冠病毒基因组序列分类中对机器学习鲁棒性进行基准测试。
Sci Rep. 2023 Mar 13;13(1):4154. doi: 10.1038/s41598-023-31368-3.
2
Roadmap to the study of gene and protein phylogeny and evolution-A practical guide.基因和蛋白质系统发生与进化研究路线图——实用指南。
PLoS One. 2023 Feb 24;18(2):e0279597. doi: 10.1371/journal.pone.0279597. eCollection 2023.
3
Transformer-based deep learning for predicting protein properties in the life sciences.基于 Transformer 的深度学习在生命科学中预测蛋白质性质。
Elife. 2023 Jan 18;12:e82819. doi: 10.7554/eLife.82819.
4
The complete mitochondrial genome and phylogenetic analysis of (Hymenoptera: Braconidae).(膜翅目:茧蜂科)的线粒体全基因组及系统发育分析
Mitochondrial DNA B Resour. 2022 Jun 7;7(6):992-993. doi: 10.1080/23802359.2022.2080605. eCollection 2022.
5
Comparative Mitochondrial Genomics of 104 Darwin Wasps (Hymenoptera: Ichneumonidae) and Its Implication for Phylogeny.104种达尔文黄蜂(膜翅目:姬蜂科)的线粒体基因组比较及其系统发育意义
Insects. 2022 Jan 25;13(2):124. doi: 10.3390/insects13020124.
6
Mitochondrial Genomes Yield Insights into the Basal Lineages of Ichneumonid Wasps (Hymenoptera: Ichneumonidae).线粒体基因组为姬蜂(膜翅目:姬蜂科)基部谱系研究提供见解
Genes (Basel). 2022 Jan 25;13(2):218. doi: 10.3390/genes13020218.
7
A globally diverse reference alignment and panel for imputation of mitochondrial DNA variants.一个全球多样化的参考比对和面板,用于推断线粒体 DNA 变异。
BMC Bioinformatics. 2021 Sep 1;22(1):417. doi: 10.1186/s12859-021-04337-8.
8
A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics.一种新的系统发育分析方法:应对分子系统发育中的模型误设和确认偏差
NAR Genom Bioinform. 2020 Jun 23;2(2):lqaa041. doi: 10.1093/nargab/lqaa041. eCollection 2020 Jun.
9
NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences.NGlyAlign:一种自动化文库构建工具,用于对齐高度变异的 HIV 包膜序列。
BMC Bioinformatics. 2021 Feb 8;22(1):54. doi: 10.1186/s12859-020-03901-y.
10
Improved thermostability of creatinase from Alcaligenes Faecalis through non-biased phylogenetic consensus-guided mutagenesis.通过无偏进化共识指导的突变,提高粪产碱杆菌肌氨酸酶的耐热性。
Microb Cell Fact. 2020 Oct 17;19(1):194. doi: 10.1186/s12934-020-01451-9.