Suppr超能文献

一种基于比对的启发式算法,用于快速的序列比对,可应用于系统发育重建。

An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction.

机构信息

Institute for Data Engineering and Science, Georiga Institute of Technology, 756 W Peachtree Street NW, Atlanta, USA.

Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, USA.

出版信息

BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):404. doi: 10.1186/s12859-020-03738-5.

Abstract

BACKGROUND

Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS, have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACS takes O(n logkn) time and hence impractical for large datasets, multiple heuristics that can approximate ACS have been introduced.

RESULTS

In this paper, we present a novel linear-time heuristic to approximate ACS, which is faster than computing the exact ACS while being closer to the exact ACS values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction.

CONCLUSIONS

Our method produces a better approximation for ACS and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs .

摘要

背景

无比对方法在许多生物信息学应用中变得流行,特别是在估计序列相似性度量以构建系统发育树方面。最近,平均公共子串度量 ACS 及其 k 错配对应物 ACS 已被证明在重建系统发育树方面与基于多重序列比对的方法一样有效。由于计算 ACS 需要 O(n logkn) 的时间,因此对于大型数据集来说不切实际,因此已经引入了多种可以近似 ACS 的启发式方法。

结果

在本文中,我们提出了一种新的线性时间启发式方法来近似 ACS,它比计算精确的 ACS 更快,并且与以前发表的线性时间贪婪启发式方法相比,更接近精确的 ACS 值。我们使用包含 DNA 和蛋白质序列的四个真实数据集来评估我们的算法在准确性、运行时间方面的表现,并展示其在系统发育重建方面的适用性。我们的算法提供了比以前发表的启发式方法更好的准确性,同时在应用于系统发育重建方面也具有可比性。

结论

我们的方法对 ACS 进行了更好的近似,并且可以以极具竞争力的速度应用于生物序列的无比对比较。该算法是用 Rust 编程语言实现的,源代码可在 https://github.com/srirampc/adyar-rs 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3158/7672814/7b609454fbee/12859_2020_3738_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验