Suppr超能文献

几种蛋白质多序列比对程序的准确性。

The accuracy of several multiple sequence alignment programs for proteins.

作者信息

Nuin Paulo A S, Wang Zhouzhi, Tillier Elisabeth R M

机构信息

Division of Cancer Genomics and Proteomics, Ontario Cancer Institute, University Health Network, 101 College St, M5G 1L7, Toronto, Ontario, Canada.

出版信息

BMC Bioinformatics. 2006 Oct 24;7:471. doi: 10.1186/1471-2105-7-471.

Abstract

BACKGROUND

There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.

RESULTS

We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30,000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases.

CONCLUSION

Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.

摘要

背景

已经有许多算法和软件程序用于推断蛋白质和DNA序列的多序列比对。由于对序列进化历史的了解不完整,“真实”比对通常是未知的,这使得评估这些程序的相对准确性变得困难。

结果

我们测试了九个最常用的蛋白质比对程序,并使用模拟软件Simprot生成的序列比较了它们的结果,该软件在现实且可控的进化场景下创建已知比对。我们使用各种进化历史模拟了超过30,000个比对集,以确定每个测试程序的优缺点。我们发现比对准确性极大地依赖于序列中插入和缺失的数量,而插入缺失大小的影响较弱。我们还考虑了最新版BAliBASE中的基准比对,并且在大多数情况下,相对于BAliBASE和Simprot生成的数据集的结果是一致的。

结论

我们的结果表明,与通常的比对准确性评估方法相比,使用Simprot的模拟序列可以创建更灵活、范围更广的比对类别。Simprot还允许快速有效地分析更广泛的可能进化历史,而这些进化历史可能不存在于当前可用的比对集中。在所测试的九个程序中,Mafft(L-INS-i)和ProbCons中可用的迭代方法始终是最准确的,其中Mafft是两者中速度更快的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da5d/1633746/1ad81fe556ba/1471-2105-7-471-2.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验