Suppr超能文献

WITCH-NG:对具有序列长度异质性的数据集进行高效且准确的比对。

WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity.

作者信息

Liu Baqiao, Warnow Tandy

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA.

出版信息

Bioinform Adv. 2023 Mar 6;3(1):vbad024. doi: 10.1093/bioadv/vbad024. eCollection 2023.

Abstract

SUMMARY

Multiple sequence alignment is a basic part of many bioinformatics pipelines, including in phylogeny estimation, prediction of structure for both RNAs and proteins, and metagenomic sequence analysis. Yet many sequence datasets exhibit substantial sequence length heterogeneity, both because of large insertions and deletions in the evolutionary history of the sequences and the inclusion of unassembled reads or incompletely assembled sequences in the input. A few methods have been developed that can be highly accurate in aligning datasets with sequence length heterogeneity, with UPP one of the first methods to achieve good accuracy, and WITCH a recent improvement on UPP for accuracy. In this article, we show how we can speed up WITCH. Our improvement includes replacing a critical step in WITCH (currently performed using a heuristic search) by a polynomial time exact algorithm using Smith-Waterman. Our new method, WITCH-NG (i.e. 'next generation WITCH') achieves the same accuracy but is substantially faster. WITCH-NG is available at https://github.com/RuneBlaze/WITCH-NG.

AVAILABILITY AND IMPLEMENTATION

The datasets used in this study are from prior publications and are freely available in public repositories, as indicated in the Supplementary Materials.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

摘要

多序列比对是许多生物信息学流程的基本组成部分,包括系统发育估计、RNA和蛋白质结构预测以及宏基因组序列分析。然而,许多序列数据集存在显著的序列长度异质性,这既是由于序列进化历史中的大量插入和缺失,也是由于输入中包含未组装的读段或组装不完整的序列。已经开发了一些方法,这些方法在比对具有序列长度异质性的数据集时可以达到很高的准确性,UPP是最早获得良好准确性的方法之一,而WITCH是对UPP准确性的最新改进。在本文中,我们展示了如何加速WITCH。我们的改进包括用使用Smith-Waterman的多项式时间精确算法取代WITCH中的一个关键步骤(目前使用启发式搜索执行)。我们的新方法WITCH-NG(即“下一代WITCH”)实现了相同的准确性,但速度要快得多。WITCH-NG可在https://github.com/RuneBlaze/WITCH-NG上获取。

可用性和实现

本研究中使用的数据集来自先前的出版物,可在公共存储库中免费获取,如补充材料中所示。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9252/10035637/605d816d624e/vbad024f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验