Suppr超能文献

使用SATé、PASTA和UPP对大型异构数据集进行多序列比对。

Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.

作者信息

Warnow Tandy, Mirarab Siavash

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA.

出版信息

Methods Mol Biol. 2021;2231:99-119. doi: 10.1007/978-1-0716-1036-7_7.

Abstract

The estimation of very large multiple sequence alignments is a challenging problem that requires special techniques in order to achieve high accuracy. Here we describe two software packages-PASTA and UPP-for constructing alignments on large and ultra-large datasets. Both methods have been able to produce highly accurate alignments on 1,000,000 sequences, and trees computed on these alignments are also highly accurate. PASTA provides the best tree accuracy when the input sequences are all full-length, but UPP provides improved accuracy compared to PASTA and other methods when the input contains a large number of fragmentary sequences. Both methods are available in open source form on GitHub.

摘要

估计非常大的多序列比对是一个具有挑战性的问题,需要特殊技术才能实现高精度。在这里,我们描述了两个软件包——PASTA和UPP——用于在大型和超大型数据集上构建比对。这两种方法都能够在100万个序列上生成高度准确的比对,并且基于这些比对计算出的树也高度准确。当输入序列都是全长时,PASTA提供了最佳的树准确性,但是当输入包含大量片段序列时,与PASTA和其他方法相比,UPP提供了更高的准确性。这两种方法都可以在GitHub上以开源形式获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验